PHast -- Perfect Hashing made fast
Piotr Beling, Peter Sanders
TL;DR
PHast tackles the challenge of ultra-fast queries for perfect hash functions while keeping space near the information-theoretic minimum. It introduces a bucket-placement framework with fixed-width per-bucket seeds and a bumping mechanism, plus a PHast+ variant that uses additive placement for bit-parallel seed searching, yielding sub-2 bits per key and strong practical performance. Through extensive benchmarks, PHast and PHast+ demonstrate fast query evaluation and favorable space/construction-time trade-offs against state-of-the-art MPHFs, aided by cache-friendly layout and parallel construction. The work also outlines external-memory extensions and avenues for GPU acceleration and k-perfect hashing, making PHast a practical, scalable solution for large static datasets.
Abstract
Perfect hash functions give unique "names" to arbitrary keys requiring only a few bits per key. This is an essential building block in applications like static hash tables, databases, or bioinformatics. This paper introduces the PHast approach that combines the fastest available queries, very fast construction, and good space consumption (below 2 bits per key). PHast improves bucket-placement which first hashes each key k to a bucket, and then looks for the bucket seed s such that a placement function maps pairs (s,k) in a collision-free way. PHast can use small-range hash functions with linear mapping, fixed-width encoding of seeds, and parallel construction. This is achieved using small overlapping slices of allowed values and bumping to handle unsuccessful seed assignment. A variant we called PHast+ uses additive placement, which enables bit-parallel seed searching, speeding up the construction by an order of magnitude.
