Space Upper Bounds for $α$-Perfect Hashing

Ryan Song; Emre Telatar

Space Upper Bounds for $α$-Perfect Hashing

Ryan Song, Emre Telatar

Abstract

In the problem of perfect hashing, we are given a size $k$ subset $\mathcal{A}$ of a universe of keys $[n] = \{1,2, \cdots, n\}$, for which we wish to construct a hash function $h: [n] \to [b]$ such that $h(\cdot)$ maps $\mathcal{A}$ to $[b]$ with no collisions, i.e., the restriction of $h(\cdot)$ to $\mathcal{A}$ is injective. When $b=k$, the problem is referred to as minimal perfect hashing. In this paper, we extend the study of minimal perfect hashing to the approximate setting. For some $α\in [0, 1]$, we say that a randomized hashing scheme is $α$-perfect if for any input $\mathcal{A}$ of size $k$, it outputs a hash function which exhibits at most $(1-α)k$ collisions on $\mathcal{A}$ in expectation. One important performance consideration for any hashing scheme is the space required to store the hash functions. For minimal perfect hashing, i.e., $b = k$, it is well known that approximately $k\log(e)$ bits, or $\log(e)$ bits per key, is required to store the hash function. In this paper, we propose schemes for constructing minimal $α$-perfect hash functions and analyze their space requirements. We begin by presenting a simple base-line scheme which randomizes between perfect hashing and zero-bit random hashing. We then present a more sophisticated hashing scheme based on sampling which significantly improves upon the space requirement of the aforementioned strategy for all values of $α$.

Space Upper Bounds for $α$-Perfect Hashing

Abstract

In the problem of perfect hashing, we are given a size

subset

of a universe of keys

, for which we wish to construct a hash function

such that

maps

with no collisions, i.e., the restriction of

is injective. When

, the problem is referred to as minimal perfect hashing. In this paper, we extend the study of minimal perfect hashing to the approximate setting. For some

, we say that a randomized hashing scheme is

-perfect if for any input

of size

, it outputs a hash function which exhibits at most

collisions on

in expectation. One important performance consideration for any hashing scheme is the space required to store the hash functions. For minimal perfect hashing, i.e.,

, it is well known that approximately

bits, or

bits per key, is required to store the hash function. In this paper, we propose schemes for constructing minimal

-perfect hash functions and analyze their space requirements. We begin by presenting a simple base-line scheme which randomizes between perfect hashing and zero-bit random hashing. We then present a more sophisticated hashing scheme based on sampling which significantly improves upon the space requirement of the aforementioned strategy for all values of

Paper Structure (11 sections, 4 theorems, 46 equations, 1 figure)

This paper contains 11 sections, 4 theorems, 46 equations, 1 figure.

Introduction
Main Results
Notation
Problem Formulation
Minimal Perfect and Zero-Bit Hashing
Minimal Perfect Hashing
Minimal Zero-Bit Hashing
Randomizing Between Perfect and Zero-Bit Hashing
Hashing via Sampling
Space Bounds for Minimal $\alpha$-Perfect Hashing
Conclusion

Key Result

Lemma 1

Let $\mathbf{X} = (X_1, \cdots, X_k)$ be a sequence of random variables each taking values on $[k]$ such that Then, the optimal representation length $L^*(n,k,\alpha)$ for minimal $\alpha$-perfect hashing is bounded from above as

Figures (1)

Figure 1: Upper bounds on the optimal amortized representation length $R^*(\alpha)$.

Theorems & Definitions (4)

Lemma 1
Lemma 2
Lemma 3
Theorem 1

Space Upper Bounds for $α$-Perfect Hashing

Abstract

Space Upper Bounds for $α$-Perfect Hashing

Authors

Abstract

Table of Contents

Key Result

Figures (1)

Theorems & Definitions (4)