Table of Contents
Fetching ...

Space Upper Bounds for $α$-Perfect Hashing

Ryan Song, Emre Telatar

Abstract

In the problem of perfect hashing, we are given a size $k$ subset $\mathcal{A}$ of a universe of keys $[n] = \{1,2, \cdots, n\}$, for which we wish to construct a hash function $h: [n] \to [b]$ such that $h(\cdot)$ maps $\mathcal{A}$ to $[b]$ with no collisions, i.e., the restriction of $h(\cdot)$ to $\mathcal{A}$ is injective. When $b=k$, the problem is referred to as minimal perfect hashing. In this paper, we extend the study of minimal perfect hashing to the approximate setting. For some $α\in [0, 1]$, we say that a randomized hashing scheme is $α$-perfect if for any input $\mathcal{A}$ of size $k$, it outputs a hash function which exhibits at most $(1-α)k$ collisions on $\mathcal{A}$ in expectation. One important performance consideration for any hashing scheme is the space required to store the hash functions. For minimal perfect hashing, i.e., $b = k$, it is well known that approximately $k\log(e)$ bits, or $\log(e)$ bits per key, is required to store the hash function. In this paper, we propose schemes for constructing minimal $α$-perfect hash functions and analyze their space requirements. We begin by presenting a simple base-line scheme which randomizes between perfect hashing and zero-bit random hashing. We then present a more sophisticated hashing scheme based on sampling which significantly improves upon the space requirement of the aforementioned strategy for all values of $α$.

Space Upper Bounds for $α$-Perfect Hashing

Abstract

In the problem of perfect hashing, we are given a size subset of a universe of keys , for which we wish to construct a hash function such that maps to with no collisions, i.e., the restriction of to is injective. When , the problem is referred to as minimal perfect hashing. In this paper, we extend the study of minimal perfect hashing to the approximate setting. For some , we say that a randomized hashing scheme is -perfect if for any input of size , it outputs a hash function which exhibits at most collisions on in expectation. One important performance consideration for any hashing scheme is the space required to store the hash functions. For minimal perfect hashing, i.e., , it is well known that approximately bits, or bits per key, is required to store the hash function. In this paper, we propose schemes for constructing minimal -perfect hash functions and analyze their space requirements. We begin by presenting a simple base-line scheme which randomizes between perfect hashing and zero-bit random hashing. We then present a more sophisticated hashing scheme based on sampling which significantly improves upon the space requirement of the aforementioned strategy for all values of .
Paper Structure (11 sections, 4 theorems, 46 equations, 1 figure)

This paper contains 11 sections, 4 theorems, 46 equations, 1 figure.

Key Result

Lemma 1

Let $\mathbf{X} = (X_1, \cdots, X_k)$ be a sequence of random variables each taking values on $[k]$ such that Then, the optimal representation length $L^*(n,k,\alpha)$ for minimal $\alpha$-perfect hashing is bounded from above as

Figures (1)

  • Figure 1: Upper bounds on the optimal amortized representation length $R^*(\alpha)$.

Theorems & Definitions (4)

  • Lemma 1
  • Lemma 2
  • Lemma 3
  • Theorem 1