Basser Seminar Series

Bit-by-bit: Storage and querying of RDF data using bit-vectors

Speaker: Dr Medha Atre

When: Wednesday 10 June, 2015, 4:00-5:00pm

Where: The University of Sydney, School of IT Building, SIT Lecture Theatre (Room 123), Level 1

Add seminar to my diary


As the size of the RDF data on the web is increasing at a break-neck speed, efficient storage and querying of this data are the main challenges. SPARQL, a standard query language for RDF, has many structural similarities to SQL. RDF data can be serialized and stored as a relational table, and SQL query optimization techniques can be exploited for the optimization of SPARQL queries too.

Through this talk the author presents novel ways of storing the RDF data, using compressed bit-vectors, and exploiting the technique of semi-joins in the context of evaluation of SPARQL queries, instead of conventional optimization techniques. The focus of the technique is mainly on the SPARQL basic graph pattern (BGP), a.k.a. SQL inner-join queries, and SPARQL OPTIONAL pattern, a.k.a. SQL left-outer-join queries. The talk also gives an overview of ongoing work to extend this technique for other SPARQL constructs.

Speaker's biography

Medha Atre's primary research area has been application of database techniques for the management of Semantic Web (RDF) data. She did her Ph.D. at Rensselaer Polytechnic Institute (Troy NY, USA). As a part of her Ph.D., she developed an open-source system, BitMat, for an efficient processing of SPARQL join queries. After Ph.D., as a part of her independent research work, she extended this algorithm for SPARQL OPTIONAL pattern (left-outer-join) queries, which is accepted at the SIGMOD-2015 Conference. Currently she is working on further extensions of this work for a larger component of SPARQL constructs.

During a postdoctoral tenure at the University of Pennsylvania Philadelphia PA, USA), Medha worked on the enhancement of a distributed shared-nothing database system. Currently she is working on a plan to extend her Ph.D. thesis work in a distributed setting for developing an efficient distributed query processing framework for graph data. Apart from this, she has also done some primary work in the area of path query processing on RDF (graph) data.