Basser Seminar Series

Topic Modeling from Continuous-Time Document Streams with Dirichlet Hawkes Processes

Speaker: Professor Le Song
Georgia Institute of Technology, Department of Computational Science and Engineering, College of Computing, USA

When: Thursday 13 August, 2015, 5-6pm. Please note different day and time to usual.

Where: The University of Sydney, School of IT Building, SIT Board Room (Room 124), Level 1

Add seminar to my diary

Abstract

Topics and clusters in document streams, such as online news articles, can be induced by their textual contents, as well as by the temporal dynamics of their arriving patterns. Can we leverage both sources of information to obtain a better clustering of the documents, and distill information that is not possible to extract using texts only? I will talk about a novel random process, referred to as the Dirichlet Hawkes process, to take into account both text and temporal information in a unified framework. A distinctive feature of the proposed model is that the preferential attachment of items to clusters according to cluster sizes, present in Dirichlet processes, is now driven according to the intensities of cluster-wise self-exciting temporal point processes, the Hawkes processes. This new model establishes a previously unexplored connection between Bayesian Nonparametrics and temporal Point Processes, which makes the number of clusters grow to accommodate the increasing complexity of online streaming contents, while at the same time adapts to the ever changing dynamics of the respective continuous arrival time. We conducted large-scale experiments on both synthetic and real world news articles, and show that Dirichlet-Hawkes processes can recover both meaningful topics and temporal dynamics, which leads to better predictive performance in terms of content perplexity and arrival time of future documents.

Speaker's biography

Le Song is an assistant professor in the Department of Computational Science and Engineering, College of Computing, Georgia Institute of Technology. He received his Ph.D. in Computer Science from University of Sydney and NICTA in 2008, and then conducted his post-doctoral research in the School of Computer Science, Carnegie Mellon University, between 2008 and 2011. Before he joined Georgia Institute of Technology, he worked briefly as a research scientist at Google. His principal research interests lie in machine learning, especially nonparametric and kernel methods, analysis of networks and spatial/temporal dynamics, optimization, and the applications of machine learning to interdisciplinary problems. He is the recipient of NSF CAREER Award’14, IPDPS'15 Best Paper Award, NIPS’13 Outstanding Paper Award, and ICML’10 Best Paper Award. He has also served as the area chair for top machine learning conferences such as ICML, NIPS and AISTATS.