Basser Seminar Series

Foundations for Cluster Validity

Speaker: Professor Vladimir Estivill-Castro
School of Information and Communication Technology, Griffith University

Time: Tuesday 28 September 2010, 4:00-5:00pm **Note, different day to usual.
Refreshments will be available from 3:30pm

Location: The University of Sydney, School of IT Building, Lecture Theatre (Room 123), Level 1

Add seminar to my diary

Abstract

This talk will introduce and give an intuition to the two major approaches of machine learning, namely supervised learning and unsupervised learning. With this general introduction, we will show a link between these learning settings. The talk will explain the algorithm analysis technique of instance easiness that has strong links to the notion of the parameterized analysis of algorithms.

We approach this topic because "the statistical problem of testing cluster validity is essentially unsolved" [DudaHS01]. We present here an approach that translates the issue of gaining credibility on the output of un-supervised learning algorithms to the supervised learning case.

We achieve this by introducing a notion of instance easiness to supervised learning and linking the validity of a clustering to how its output constitutes an easy instance for supervised learning. Our notion of instance easiness for supervised learning extends the notion of stability to perturbations that has been used for measuring clusterability in the un-supervised setting. We show that this approach is actually applicable in practice and we follow the axiomatic and generic formulations for cluster-quality measures. As a result, we have an effective method to inform the trust we can place in a clustering result. Moreover, the method proposed here profits from the now standard validity methods for supervised learning, like V-fold cross validation.

Speaker's biography

Vladimir Estivill-Castro is currently working as a Professor in the School of Information and Communication Technology, Director of Mi-PAL and Director of the Autonomous Systems Program of the Institute for Intelligent and Integrated Systems (IIIS) at Griffith University (Australia). He is also a visiting scholar at Universitat Pompeu Fabra in Spain. His main interest are algorithmic engineering, computational complexity, intelligent data analysis, privacy preserving data mining and knowledge discovery. Prof. Estivill-Castro holds a Ph.D. from the University of Waterloo in Canada and degrees from UNAM in Mexico. He serves in the editorial board of the Journal of Research and Practice in Information Technology, is editor in chief of the series Conferences in Research and Practice in Information Technology and serves on the editorial review board of the International Journal of Data Warehousing and Mining. He recently was awarded an ALTC citation for his PhD supervision.