Basser Seminar Series

Arabesque: A System for Distributed Graph Mining

Speaker: Dr Marco Serafini
Qatar Computing Research Institute

When: Friday 1 April, 2016, 11am-12pm. Please note different day and time to usual.

Where: The University of Sydney, School of IT Building, SIT Lecture Theatre (Room 123), Level 1

Add seminar to my diary

Abstract

Distributed data processing platforms such as MapReduce and Pregel have substantially simplified the design and deployment of certain classes of distributed graph analytics algorithms. However, these platforms do not represent a good match for distributed graph mining problems, as for example finding frequent subgraphs in a graph. Given an input graph, these problems require exploring a very large number of subgraphs and finding patterns that match some “interestingness” criteria desired by the user. These algorithms are very important for areas such as social networks, semantic web, and bioinformatics.

This talk will present Arabesque, the first distributed data processing platform for implementing graph mining algorithms. Arabesque automates the process of exploring a very large number of subgraphs. It defines a high-level filter-process computational model that simplifies the development of scalable graph mining algorithms: Arabesque explores subgraphs and passes them to the application, which must simply compute outputs and decide whether the subgraph should be further extended. The Arabesque’s API has been used to produce distributed solutions to three fundamental graph mining problems: frequent subgraph mining, counting motifs, and finding cliques. These implementations require a handful of lines of code, scale to trillions of subgraphs, and represent in some cases the first available distributed solutions.

Speaker's biography

Marco Serafini is a Scientist at the Qatar Computing Research Institute, where he develops programming abstractions and systems for scalable graph search, exploration, and mining. He also works on elasticity and load balancing for real-time distributed data management systems, as well as on distributed coordination. His work has appeared in major conferences such as VLDB, SOSP, NSDI, ICDE, and PODC. He serves or has served as PC member of VLDB, ICDE, Eurosys, ICDCS, and WWW, among others, and he co-chaired the PaPoC workshop, which is co-located with Eurosys. Before QCRI he was with Yahoo! Research, where he worked on the Zookeeper coordination system. Marco got his PhD from TU Darmstadt, Germany.