Basser Seminar Series

Who ‘Dat? Identity resolution in large email collections

Speaker: Associate Professor Douglas W. Oard
University of Maryland, USA

Time: Friday 12 March 2010, 4:00-5:00pm
Refreshments will be available from 3:30pm

Location: The University of Sydney, School of IT Building, Lecture Theatre (Room 123), Level 1

Add seminar to my diary


Automated techniques that can support the human activities of search and sense-making in large email collections are of increasing importance for a broad range of uses, including historical scholarship, law enforcement and intelligence applications, and lawyers involved in “e-discovery” incident to civil litigation. In this talk, I’ll briefly describe some of the work to date on searching large email collections, and then for most of the talk I will focus on the more challenging task of support for sense-making. Specifically, I’ll describe joint work with Tamer Elsayed to automatically resolve the identity of people who are mentioned ambiguously (e.g., just by first name) in a collection of email from a failed corporation (Enron). Our results indicate that for people who are well represented in the collection we can use a generative model to guess the right identity about 80% of the time, and for others we are right about half the time. I’ll conclude the talk with a few remarks on our next directions for techniques, evaluation, and additional types of collections to which similar ideas might be applied.

Speaker's biography

Douglas Oard is an Associate Processor at the University of Maryland, College Park, with joint appointments in the College of Information Studies and the Institute for Advanced Computer Studies; he is on sabbatical at RMIT in Melbourne for the first half of 2010. Dr. Oard earned his Ph.D. in Electrical Engineering from the University of Maryland, and his research interests center around the use of emerging technologies to support information seeking by end users. His recent work has focused on interactive techniques for cross-language information retrieval, searching conversational media, and support for sense-making in large digital archival collections.

Additional information.