|
|
|
|
Project description & progress report
|
The Scamseek Project - Text Mining for Financial Scams on the Internet Final report - short version PDF (2 pages) Final report - long version PDF (6 pages) The Scamseek
project has a $2.2million budget to build a surveillance tool for identifying
financial scams on the Internet. It is funded by the Capital Markets CRC,
The Australian Securities and Investment Commission and the participating
universities. This is Australias largest research project in language
technology. The project now has two phases. Phase 1, called ScamAlert,
aims to perform document classification of internet pages. There are two
principle types of documents of concern. Those that give financial advice
by unregistered advisors, and illegal investment schemes. The system has
two major features. Firstly, documents of known scams are analysed by
linguists to identify the features that make them distinctive. Secondly,
machine-learning strategies are used to analyse the documents to derive
other features that may be useful in classification and to extract named
entities. The results of the linguistic and machine learning investigations
are combined to create a unified document classifier. The classifier is
fed by a web spider that performs a 24hour/7day week search of the Internet
for potential scam sites. Phase 2 aims to widen the scope of materials to be investigated and improve the classifiers to perform at higher standards. Partners |
|