Basser Seminar Series

The Scamseek Project - Identifying Financial Scams on the Internet

Jon Patrick (University of Sydney)

Wednesday 6 April 2005, 2-3pm

Basser Conference Room (Madsen Room G92)

Abstract

The Scamseek project had the principal objective of building an industrially viable system that retrieves potential scam candidate texts from the Internet and classifies them as to their potential risk of containing an illegal investment proposal or advice. The value of the system is the gain of significant time and efficiency savings for the human analyst. The project was developed in two stages over 15 months and produced multiple classifiers for different types of data, achieved higher than expected performance statistics on classifications, was completed on time, and under budget.The development of the system required the solution of two major problems in document classification, namely accurate identification of classes with very small footprints, <.1%, and classification using meaning intention rather than word strings. The approach taken used the semantic model of language, Systemic Functional Grammar, to model the semantics of the scam classes and used unigrams with significant language pre-processing to assist in separating irrelevant documents. Litigations have been initiated by ASIC from classifications made by the very first production run of the system. ASIC can operate the system on a 24/7 basis. The estimate of savings in human analyst effort in its monitoring role is the order of 100-fold. The estimate in savings to the community by bringing speedier detection and intervention of scams cannot be estimated readily but is likely to be of the order of tens of millions of dollars per annum.

The Scamseek project is the largest computational linguistics research project conducted in Australia with a total budget of $2.2M. It was commissioned by the Australian Securities & Investment Commission (ASIC) and funded through the University of Sydney, Macquarie University, ASIC, Capital MArkets CRC and AC3.

Speaker’s biography

Professor Jon Patrick holds the Chair of Language Technology at the University of Sydney. He has worked on the computation of language since 1982 when he built the first systems to capture in real time verbal descriptions of team sports. This work was later expanded to be a generic system for any behavioural events. He has subsequently tackled wider problems of capturing meta-descriptions of behavioural events especially in the field of psychotherapy, including commentary by experts on therapist training sessions. He is a registered psychologist and has practiced as a therapist. More recently he has studied the Basque language and produced the first comprehensive student reference text of Basque grammar in English. He has researched the automated use of static resources, such as dictionaries and grammar descriptions in the training systems for second language learning. He now concentrates on developing methods for using Systemic Functional Grammar in the analyses of the meaning in texts.