A fast, partitioned, distributed hash table for cluster usage

BearbeiterIn:Julian Wenzel
Titel:A fast, partitioned, distributed hash table for cluster usage
Typ:bachelor thesis
Betreuer:Veldema, R.; Philippsen, M.
Status:abgeschlossen am 30. Oktober 2013
Vorausetzungen:
Thema:

Background

Many parallel/distributed applications can be written to use a distributed hash table to perform its communication. Performance for such parallel/distributed applications is then fully dependent on the efficiency of the hash table. To recap, a hash table is a data-structure that presents three functions, get(), put() and contains() as below:

..class Key;
..class Data;
..class Hash {
....Data get(Hash hash, Key key);
....void put(Key key, Data data);
....bool contains(Key key);
..};

A hash is distributed if it allows one machine to perform a put() and a get() on another machine retrieves the data just added to the hash table. Such distributed hash tables have many uses, for example, for peer-2-peer applications. Indeed most distributed hash tables are for such slow Internet uses. For use with a cluster of workstations inside a parallel application, these hash tables are far too slow. For example, internet-enabled hash tables have a lot of support for fault tolerance which we do not need in a high-performance, clustered environment.
There are many possible strategies/optimization techniques for implementing an efficient hash table suitable for a cluster of workstations. For example:
1) When a put() is done, replicate the key on every machine (and retrieve the data associated with it on demand).
2) Every put() buffers a number of keys and performs bulk communication.
3) It is possible to partition the key values into ranges. Each machine is responsible for a range of keys. When executing a put(k,d), the (k, d) pair is sent to machine home(k).
4) Manage the ranges dynamically: when a machine performs many get/put operations with keys in the same range, dynamically make that machine the home for that key range.
5) Compress keys/data that have not been used for a while to reduce memory usage.
6) Store keys/data on disk when they have not been used for a while.
7) Use asynchronous communication operations.
8) Use remote memory copy operations when possible (Remote Direct Memory Acccess, RDMA).
9) If one machine has more keys/data than another machine, a form of load balancing can be performed so that each machine has roughly the same amount of memory usage.
10) Use your imagination!

Topic

At our department, we have a research model checker for Java-like programs. This model checker works by performing a fork an an interpreter (=VM) each time a decision needs to be made. One copy of a VM decides one way, the other copy of the VM decides the other way. This model checker therefore creates many copies of VMs. Unfortunately, this model checker can only use a single machine. To reduce the number of copies, a hash table is used to avoid tracing copies of VMs that have already been seen before. This hash table now needs to be made distributed.


References

Milestones

  • Implement a distributed hash table using a number of heuristics/optimizations. Your hash table should be pluggable to the VERY simple shared-memory hash table that is currently used by our model-checker project.
  • To minimize communication, the hash table should be partitioned. Partitions of the hash table should move between machines so that when a get() with neighboring keys are used, the partition of the hash can be moved to that machine to avoid many inter-machine communication steps.
  • Create a hashcode() method that is good for both our model checker and your distributed hash table. For example, a two-level hashcode() would work. The first level hashcode() would decide which partition of a hash table to use, a second level hashcode() method would decide the hash inside a partition.
  • To perform communication between machines, use the MPI (Message Passing Library, a standard high-performance communication library between cluster machines).
  • Perform extensive benchmarking of your hash table, both with your own tests and with the model-checker.
  • Write the thesis.
watermark seal