Jon Patrick and Scamseek
awarded 2005 Australian Computer Society
Eureka Prize for ICT Innovation




Aims and strategies

employment roles

Project description & progress report

Final Report


The Scamseek Project - Text Mining for Financial Scams on the Internet

Final report - short version PDF (2 pages)

Final report - long version PDF (6 pages)

The Scamseek project has a $2.2million budget to build a surveillance tool for identifying financial scams on the Internet. It is funded by the Capital Markets CRC, The Australian Securities and Investment Commission and the participating universities. This is Australia’s largest research project in language technology. The project now has two phases. Phase 1, called ScamAlert, aims to perform document classification of internet pages. There are two principle types of documents of concern. Those that give financial advice by unregistered advisors, and illegal investment schemes. The system has two major features. Firstly, documents of known scams are analysed by linguists to identify the features that make them distinctive. Secondly, machine-learning strategies are used to analyse the documents to derive other features that may be useful in classification and to extract named entities. The results of the linguistic and machine learning investigations are combined to create a unified document classifier. The classifier is fed by a web spider that performs a 24hour/7day week search of the Internet for potential scam sites.

Phase 2 aims to widen the scope of materials to be investigated and improve the classifiers to perform at higher standards.

Capital Markets Co-operative Research Centre (CMCRC)
Australian Securities and Investment Commission (ASIC)
University of Sydney
Macquarie University