Hamming Distance Oracle

Itai Boneh; Dvir Fried; Shay Golan; Matan Kraus

Hamming Distance Oracle

Itai Boneh, Dvir Fried, Shay Golan, Matan Kraus

TL;DR

This work introduces the Hamming Distance Oracle problem: given strings $S$ and $T$, preprocess them to answer substring HD queries efficiently. It achieves a tunable preprocessing–query-time trade-off using a suffix-based dynamic programming approach, with a core dependence on the time to solve text-to-pattern HD queries, yielding $\tilde{O}(nm/x)$ preprocessing and $O(x)$ query time for constant alphabets, and $\tilde{O}(nm/\sqrt{x})$ preprocessing with the same query time for general alphabets. A central contribution is showing a conditional lower bound under a combinatorial matrix-multiplication conjecture, ruling out faster combinatorial trade-offs for the binary case, and extending the lower bound to the Hamming Distance Oracle via a reduction from Boolean matrix multiplication. The results place near-optimal limits on preprocessing and query-time trade-offs for Hamming distance queries on substrings, and connect the problem to fundamental questions in combinatorial matrix multiplication.

Abstract

In this paper, we present and study the \emph{Hamming distance oracle problem}. In this problem, the task is to preprocess two strings $S$ and $T$ of lengths $n$ and $m$, respectively, to obtain a data-structure that is able to answer queries regarding the Hamming distance between a substring of $S$ and a substring of $T$. For a constant size alphabet strings, we show that for every $x\le nm$ there is a data structure with $\tilde{O}(nm/x)$ preprocess time and $O(x)$ query time. We also provide a combinatorial conditional lower bound, showing that for every $\varepsilon > 0$ and $x \le nm$ there is no data structure with query time $O(x)$ and preprocess time $O((\frac{nm}{x})^{1-\varepsilon})$ unless combinatorial fast matrix multiplication is possible. For strings over general alphabet, we present a data structure with $\tilde{O}(nm/\sqrt{x})$ preprocess time and $O(x)$ query time for every $x \le nm$.

Hamming Distance Oracle

TL;DR

This work introduces the Hamming Distance Oracle problem: given strings

and

, preprocess them to answer substring HD queries efficiently. It achieves a tunable preprocessing–query-time trade-off using a suffix-based dynamic programming approach, with a core dependence on the time to solve text-to-pattern HD queries, yielding

preprocessing and

query time for constant alphabets, and

preprocessing with the same query time for general alphabets. A central contribution is showing a conditional lower bound under a combinatorial matrix-multiplication conjecture, ruling out faster combinatorial trade-offs for the binary case, and extending the lower bound to the Hamming Distance Oracle via a reduction from Boolean matrix multiplication. The results place near-optimal limits on preprocessing and query-time trade-offs for Hamming distance queries on substrings, and connect the problem to fundamental questions in combinatorial matrix multiplication.

Abstract

In this paper, we present and study the \emph{Hamming distance oracle problem}. In this problem, the task is to preprocess two strings

and

of lengths

and

, respectively, to obtain a data-structure that is able to answer queries regarding the Hamming distance between a substring of

and a substring of

. For a constant size alphabet strings, we show that for every

there is a data structure with

preprocess time and

query time. We also provide a combinatorial conditional lower bound, showing that for every

and

there is no data structure with query time

and preprocess time

unless combinatorial fast matrix multiplication is possible. For strings over general alphabet, we present a data structure with

preprocess time and

query time for every

Paper Structure (4 sections, 2 theorems, 1 figure)

This paper contains 4 sections, 2 theorems, 1 figure.

Introduction
Preliminaries
Hamming Distance Oracle
Lower Bound for the binary case of $\text{Hamming Distance Oracle}$

Key Result

Theorem 1

Fix $x\geq 1$. Given two strings $S, T$ over an alphabet $\Sigma$, such that $|S|=n$, $|T|=m$ and $m\leq n$. There exists a data structure for prob:SHDO with a preprocessing time of $O(\frac{n}{x}\cdot T_{\mathsf{HD}}(m,x,\Sigma))$, and a query time of $O(\min(m,x))$.

Figures (1)

Figure 1: A summary of our results for $n=m$. the $p$-axis corresponds to the exponent of the preprocess time and the $q$-axis corresponds to the exponent of the query time. For example, we have a general upper bound of $\tilde{O}(n^{1.75})$ preprocess time and $O(\sqrt{n})$ query time. Note that the lower bound is combinatorial.

Theorems & Definitions (3)

Theorem 1
Conjecture 2: Combinatorial Matrix Multiplication, see GU18
Theorem 3: Lower bound

Hamming Distance Oracle

TL;DR

Abstract

Hamming Distance Oracle

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (1)

Theorems & Definitions (3)