Research and Research-Linked projects for sem1 2007, supervised by Alan Fekete

An Isolation-Sensitive Benchmark for OLTP

The standard benchmark used to evaluate the performance on OLTP systems is TPC-C. One key requirement of the benchmark says that the system under test must run the benchmark transactions correctly (serializably); indeed, the benchmark seems designed to produce lots of conflicts between concurrent transactions at high load, in order to test the efficiency of the locking mechanisms. However, it is known that special characteristics of the benchmark programs mean that some incorrect concurrency control algorithms (such as the Snapshot Isolation method used in Oracle and PostresSQL) still give serializable execution for the particular programs that make up the benchmark. The goal of this project is to design a benchmark that is targeted for investigating the performance/correctness tradeoffs. It should have the property that different weak isolation mechanisms (such as snapshot or readcommitted isolation) produce easily detected violations of integrity constraints; also, different concurrency control should give different violations. Of course, the benchmark should still have the properties of any benchmark: easily reproduced, sensitive to the different aspects of performance (including disk management, buffer management, query optimization, etc), and a reasonable model of typical application behaviour!

This project is suitable for Research Track (18crpts). This project would be part of the activity of the Middleware research group of the School. This project requires good awareness of performance issues, very careful handling of numerical data, and some programming.

Support for general predicates in promises

"Promiss" is an approach to providing isolation between long-running activities in service oriented architectures, introduced recently by researchers from Sydney University and CSIRO (see Isolation Support for Service-Based Applications: A Position Paper" by Greenfield, Fekete, Jang, Kuo and Nepal). We have discussed how to implement promises for the simpler access styles in Delivering Promises for Isolation Support" by Jang, Fekete and Greenfield, but an important case has not yet been designed fully. This research project will complete the story by showing how to support promises which refer to resources based on arbitrary properties. For example, one client could request a promise of availability for a hotel room with a sea view, while another client wants to be guaranteed that they can get a non-smoking room. The promise Manager will need to be aware of which rooms can satisfy different requests, and make sure double-booking never occurs.

This is suitable as 18crpts Research project. This project would be part of the activity of the Middleware research group of the School. This project requires good programming skills, including database queries and use of logic tools.

Update-anywhere with Berkeley DB replication

Berkeley DB is a popular open source embedded database library used in a wide variety of open source and commercial applications. It supports many advanced database features such as ACID transactions, fine-grained locking, hot backups and replication. The most recent release adds support for multiversion concurrency control with snapshot isolation. The replication support in Berkeley DB allows a group of database environments to share a set of databases. There is a single master database environment and one or more client database environments. Master environments support both database reads and writes; client environments support only database reads. If the master environment fails, applications may upgrade a client to be the new master. All databases are replicated from the master to all clients. Currently, the application has to keep track of which node is currently the master, and direct updates from clients to the master (something that Berkeley DB does not provide any help with). The goal of this project is to allow updates to be initiated at any node, and use snapshot isolation to perform the updates safely at the master site. It will involve getting your hands dirty with replicated database code.

Snapshot isolation relies on maintaining multiple versions of data, and reading data as it was at a given timestamp (the timestamp current when the transaction started). When a client wants to initiate an update, this project will extend Berkeley DB to begin two transactions, one on the client and one on the master. All reads can use the transaction on the client, updates will be performed on the master using the corresponding transaction there. The "first committer wins" rule of snapshot isolation will ensure that if the transaction on the master commits successfully, the data read by the client was valid.

This project is suitable for Research Track (18crpts). There are two parts to the work: setting up a pair of transactions with equal read timestamps, and ensuring that the application does in fact run at snapshot isolation. In particular, if within a transaction the application writes then reads some data, it should see its own updates. This will involve either shipping results from master to client within the scope of the transaction, or performing the updates locally on the client in addition to on the master. There are several design choices involved, and the evaluation of the resulting system in light of those choices will be a critical component of the project. Skills required: excellent C programming and awareness of concurrency control issues.