Research and Research-Linked projects for sem1 2007, supervised by Alan Fekete
An Isolation-Sensitive Benchmark for OLTP
The standard benchmark used to evaluate the performance on OLTP systems is
TPC-C. One key requirement of the benchmark
says that the system under test must run the benchmark transactions correctly
(serializably); indeed, the benchmark seems designed
to produce lots of conflicts between concurrent transactions at high load,
in order to test the efficiency of the locking mechanisms. However, it is known
that special characteristics of the benchmark programs
mean that some incorrect concurrency control algorithms
(such as the Snapshot Isolation
method used in Oracle and PostresSQL) still give serializable execution for
the particular programs that make up the benchmark. The goal of this project
is to design a benchmark that is targeted for
investigating the performance/correctness tradeoffs. It should have the
property that different weak isolation mechanisms (such as snapshot
or readcommitted isolation) produce easily detected violations
of integrity constraints; also, different concurrency
control should give different violations. Of course, the benchmark
should still have the properties of any benchmark: easily reproduced,
sensitive to the different aspects of performance (including disk management,
buffer management, query optimization, etc), and a reasonable model of
typical application behaviour!
This project is suitable for Research Track (18crpts).
This project would be part of the activity of
the Middleware research group of the School. This project requires good
awareness of performance issues, very careful handling
of numerical data, and some programming.
Support for general predicates in promises
"Promiss" is an approach to providing isolation between long-running
activities in service oriented architectures,
introduced recently by researchers
from Sydney University and CSIRO (see
Isolation Support
for Service-Based Applications: A Position Paper" by
Greenfield, Fekete, Jang, Kuo and Nepal).
We have discussed how to implement promises for the simpler
access styles in Delivering
Promises for Isolation Support" by Jang, Fekete and Greenfield,
but an important case has not yet been designed fully. This research project
will complete the story by showing how to support promises which refer
to resources based on arbitrary properties. For example,
one client could request a promise of availability for a hotel
room with a sea view, while another client wants to be guaranteed
that they can get a non-smoking room. The promise Manager will need to
be aware of which rooms can satisfy different requests, and make sure
double-booking never occurs.
This is suitable as 18crpts Research project.
This project would be part of the activity of
the Middleware research group of the School. This project requires good
programming skills, including database queries and use of logic tools.
Update-anywhere with Berkeley DB replication
Berkeley DB is a popular open source embedded database library used
in a wide variety of open source and commercial applications. It
supports many advanced database features such as ACID transactions,
fine-grained locking, hot backups and replication. The most recent
release adds support for multiversion concurrency control with
snapshot isolation.
The replication support in Berkeley DB allows a group of database
environments to share a set of databases. There is a single master
database environment and one or more client database environments.
Master environments support both database reads and writes; client
environments support only database reads. If the master environment
fails, applications may upgrade a client to be the new master. All
databases are replicated from the master to all clients.
Currently, the application has to keep track of which node is
currently the master, and direct updates from clients to the master
(something that Berkeley DB does not provide any help with). The
goal of this project is to allow updates to be initiated at any node,
and use snapshot isolation to perform the updates safely at the
master site. It will involve getting your hands dirty with
replicated database code.
Snapshot isolation relies on maintaining multiple versions of data,
and reading data as it was at a given timestamp (the timestamp
current when the transaction started). When a client wants to
initiate an update, this project will extend Berkeley DB to begin
two transactions, one on the client and one on the master. All
reads can use the transaction on the client, updates will be
performed on the master using the corresponding transaction there.
The "first committer wins" rule of snapshot isolation will ensure
that if the transaction on the master commits successfully, the data
read by the client was valid.
This project is suitable for Research Track (18crpts).
There are two parts to the work: setting up a pair of transactions
with equal read timestamps, and ensuring that the application does in
fact run at snapshot isolation. In particular, if within a
transaction the application writes then reads some data, it should
see its own updates. This will involve either shipping results from
master to client within the scope of the transaction, or performing
the updates locally on the client in addition to on the master.
There are several design choices involved, and the evaluation of the
resulting system in light of those choices will be a critical
component of the project.
Skills required: excellent C programming and awareness of
concurrency control issues.