Research

My research group, schwa lab, is focused on computational linguistics, and works on practical applications of text mining in a number of domains. A popular level overview of my research is given in this University News article.

Financial Text Mining

In collaboration with Fairfax Media, Australia's leading provider of online news and classifieds, we work on the 'Computable News' project. This project is developing state-of-the-art approaches to entity linking and event tracking to assist journalists, readers and analysts to find relevant stories and research news archives. We are currently exploring named entity linking and user-driven event extraction.

Web-scale Parsing

The web is a challenging arena for syntactic parsing, because of its scale and variety of styles, genres, and domains. We are working on scaling and adapting an existing wide-coverage parser to the web; evaluate and run this parser on Wikipedia, a large and semi-structured text collection; use the parsed wiki data for an innovative form of bootstrapping to make the parser both more efficient and more accurate; and finally use the parsed web data for a variety of NLP semantic tasks, including a novel combination of distributional and compositional semantics to improve performance on tasks which require fine-grained syntax/semantic intergration.

CCG Parsing

Along with Stephen Clark I have developed a wide-coverage parser for Combinatory Categorial Grammar (CCG). Estimating the maximum entropy parsing models is a very computationally intensive task requiring an efficient distributed implementation on a large Beowulf cluster. The parser performs with state-of-the-art accuracy but parses much faster than other linguistically motivated parsers. We are working on improving parsing accuracy and techniques for porting it to new domains.

Scientific Text Mining

We are developing systems for exploiting the large and ever increasing volumes of scientific literature. In particular, we are focusing on the the development of tools and systems for extracting information and answering questions in two domains: Astronomy and Genomics.