Le Song

 

PhD Student, School of Information Technologies,

The University of Sydney, Sydney, NSW 2006, Australia

Statistical Machine Learning Program,

National ICT Australia, Canberra, ACT 2601, Australia

 

Thesis Advisor: Alex Smola

 

E-Mail:

Phone: +61 401 946 960

 

 

CV        Publications        Software        Datasets        Links       

 

Thesis Draft

 

Research Interests

Machine learning, kernel methods and information visualization. Applications to biological and social data analysis.

 

Learning via Hilbert Space Embedding of Distributions

This is a framework of methods which allow us to compute distances between distributions without the need for intermediate density estimation. Moreover, these methods allow algorithm designers to specify which properties of a distribution are most relevant to their problems. Basically, distributions are embedded into Hilbert spaces via mean maps, and then all subsequent operations on distributions are carried out in the Hilbert space. This often leads to algorithms which are simpler and more effective than information theoretic methods in a broad range of applications.

 

Learning via Dependence Estimation

This research aims to develop a learning framework based on statistical dependence estimation. Many learning tasks can be cast into this framework: for instance, classification can be treated as learning a function such that the dependence between the predicted labels and the given labels are maximized; clustering can be viewed as generating the labels such that their dependence on the data is maximized. Besides classification and clustering, the dependence estimation view of learning can be applied to a variety of other learning tasks, such as feature selection, data point selection and dimensionality reduction. When expressing the dependence as the square of the Hilbert-Schmidt norm of the cross-covariance operator, this framework recovers many existing algorithms as special cases¡ªthey are different only in their choice of kernels. By choosing an appropriate kernel, this framework also leads to many new and interesting algorithms.

 

Biomedical Signal Processing and Brain-Computer Interface

Brain-computer interface (BCI) is a communication system that relies on the brain rather than the body for control and feedback. My research employs a novel type of features based explicitly on the neurophysiology of EEG signals for classification. Basically, EEG signals are considered as the outputs of a networked dynamical system. The nodes of this system consist of cortical patches, while the links correspond to neural fibers. A large and complex system like this often generates interesting collective dynamics, such as synchronization in the activities of the nodes, and they result in the change of EEG patterns measured on the scalp. These features from the collective dynamics of the system are employed for classification.

 

Visualizing Biological and Social Networks

Much of the world¡¯s information has a relational structure and can be modelled mathematically as graphs.  Examples include webgraphs, social networks, and biological networks.  Recent discoveries show that many of these large and complex networks exhibit the small world phenomenon and follow a power-law degree distribution. Traditional graph drawing algorithms based on random graph models thus fail to produce an effective visualisation for these networks. We designed new graph drawing algorithms which take advantage of the above mentioned two emergent properties.

 

Publications

 

1.     L. Song, X. Zhang, A. Smola, A. Gretton and B. Schoelkopf, ¡°Tailoring density estimation via reproducing kernel moment matching,¡± 25th International Conference on Machine Learning (ICML 2008).

 

2.     S. Kuan, J. Gatt, C. Dobson-Stone, D. Palmer, R. Paul, L. Song, E. Gordon, P. Schofield and L. Williams, ¡°A polymorphism of the MAOA gene is associated with emotional brain and behaviour makers of antisocial and psychopathic personality traits,¡± (submitted to the Journal of Neuroscience).

 

3.     L. Song, A. Smola, K. Borgwardt and A. Getton, ¡°Colored maximum variance unfolding,¡± Neural Information Processing Systems 2007 (NIPS 07). (Full Oral Presentation). [pdf][appendix]

 

4.     Gretton, K. Fukumizu, C.H. Teo, L. Song, B. Schoelkopf and A. Smola, ¡°A kernel statistical test of independence,¡± Neural Information Processing Systems 2007 (NIPS 07). (Poster Spotlight).

 

5.     L. Song, A. Smola, A. Gretton, J. Bedo and K. Borgwardt, ¡°Feature selection via dependence maximization,¡± Journal of Machine Learning Researches. [submitted][preprint]

 

6.     Smola, A. Gretton, K. Borgwardt, L. Song and B. Scheolkopf, ¡°A Hilbert space embedding for distributions,¡± 18th International Conference on Algorithmic Learning Theory (ALT 2007). [pdf]

 

7.     L. Song, J. Bedo, K. Borgwardt, A. Getton and A. Smola, ¡°Gene selection via the BAHSIC family of algorithms,¡± 15th Intl. Conference on Intelligent Systems for Molecular Biology (ISMB 2007). [preprint][supplementary]

 

8.     L. Song, A. Smola, Arthur Gretton, K. Borgwardt and J. Bedo, ¡°Supervised feature selection via dependence estimation,¡± 24th International Conference on Machine Learning (ICML 2007). (long version or technical report [pdf])

 

9.     L. Song, A. Smola, Arthur Gretton and K. Borgwardt, ¡°A dependence maximization view of clustering,¡± 24th International Conference on Machine Learning (ICML 2007). (long version or technical report [pdf])

 

10. L. Williams, D. Palmer, B. Liddell, L. Song and E. Gordon, ¡°The ¡®when¡¯ and ¡®where¡¯ of perceiving signals of threat versus non-threat,¡± NeuroImage, vol 31, pp. 458¨C467, 2006. [link]

 

11. L. Song, and J. Epps, ¡°Classifying EEG for brain-computer interfaces: learning optimal filters for dynamical system features¡±, 23rd International Conference on Machine Learning (ICML 2006). [pdf]

 

12. L. Song, and J. Epps, ¡°Improving the separability of EEG signals during motor imagery with an efficient circular Laplacian¡±, 31st IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2006). [pdf]

 

13. L. Song, E. Gordon, and E. Gysels, ¡°Phase Synchrony Rate for the Recognition of Motor Imagery in BCIs¡±, Neural Information Processing Systems 2005 (NIPS 05). [pdf]

 

14. L. Song, ¡°Desynchronization network analysis for the recognition of imagined movement,¡± 27th IEEE EMBS Annual International Conference, 2005. [pdf]

 

15. W. Huang, C. Murray, X. Shen, L. Song, Y.X. Wu, and L. Zheng, ¡°Visualization and analysis of network motifs¡±, 9th International Conference on Information Visualization (IV 2005), 2005. [pdf]

 

16. Ahmed, T. Dywer, S.H. Hong, C. Murray, L. Song, and Y.X. Wu, ¡°Visualization and analysis of large and complex scale-free networks,¡± IEEE VGTC Symposium on Visualization (EUROGRAPHICS), 2005. [pdf]

 

17. L. Song, and M. Takatsuka, ¡°Real-time 3d finger pointing for an augmented desk,¡± 6th Australasian User Interface Conference, CRPIT 40, 2005. [pdf]

 

18. L. Zheng, L. Song and P. Eades, ¡°Crossing minimization problems of drawing bipartite graphs in two clusters,¡± Asian-Pacific Symposium on Information Visualization, CRPIT 45, 2005. [pdf]

 

19. Ahmed, T. Dywer, S.H. Hong, C. Murray, L. Song, and Y.X. Wu, ¡°Wilmascope graph visualization,¡± IEEE Information Visualization (InfoVis), 2004. [pdf][link]

 

Earlier Work

 

1.     S.Q. Liu, and L. Song, ¡°Curvature relation of wave front and wave changing in external field,¡± Applied Mathematics and Mechanics, 26(7), 2005. [Chinese draft pdf]

 

2.     S.Q. Liu, and L. Song, ¡°The numerical analysis of Lobster stomatogastric nervous system,¡± Acta Biophysica Sinica, 20(3), 2004. [Chinese pdf]

 

3.     L. Song, B. Jiang, and Y.L. Zhu, ¡°The waterways¡ªa certain future,¡± The Interdisciplinary Contest in Modeling (hosted by the Consortium for Mathematics and its Application, and NSF), 2001. [pdf]

 

Software

 

¡¤        BAHSIC: backward elimination for feature selection via dependence estimation. Support linear, nonlinear, binary, multiclass and regression feature selection.  (A prototype written in Python)

 

¡¤        CLUHSIC: clustering via dependence estimation. An additional metric can be applied on the cluster labels. (Written in C and examples given in Matlab)

 

¡¤        MUHSIC: dimensionality reduction via dependence estimation. Side information can be incorporated into the visualization. (A mix of Matlab and C)

 

¡¤        Incomplete Cholesky Decomposition: linearize the kernel matrix for a nonlinear kernel. (Written in Python)

 

¡¤        Other codes in ELEFANT.

 

Datasets

 

¡¤        Feature selection

 

¡¤        Clustering

 

¡¤        Dimensionality reduction

 

Links

 

¡¤        Alex Smola

¡¤        Arthur Gretton

¡¤        Bernhard Schölkopf

¡¤        Choon Hui Teo

¡¤        Karsten Borgwardt

¡¤        Quoc Viet Le

¡¤        Shanheng Zhao

¡¤        Xinhua Zhang

¡¤        Ying Xin Wu