Table of Contents
Fetching ...

Hamming Distance Oracle

Itai Boneh, Dvir Fried, Shay Golan, Matan Kraus

TL;DR

This work introduces the Hamming Distance Oracle problem: given strings $S$ and $T$, preprocess them to answer substring HD queries efficiently. It achieves a tunable preprocessing–query-time trade-off using a suffix-based dynamic programming approach, with a core dependence on the time to solve text-to-pattern HD queries, yielding $\tilde{O}(nm/x)$ preprocessing and $O(x)$ query time for constant alphabets, and $\tilde{O}(nm/\sqrt{x})$ preprocessing with the same query time for general alphabets. A central contribution is showing a conditional lower bound under a combinatorial matrix-multiplication conjecture, ruling out faster combinatorial trade-offs for the binary case, and extending the lower bound to the Hamming Distance Oracle via a reduction from Boolean matrix multiplication. The results place near-optimal limits on preprocessing and query-time trade-offs for Hamming distance queries on substrings, and connect the problem to fundamental questions in combinatorial matrix multiplication.

Abstract

In this paper, we present and study the \emph{Hamming distance oracle problem}. In this problem, the task is to preprocess two strings $S$ and $T$ of lengths $n$ and $m$, respectively, to obtain a data-structure that is able to answer queries regarding the Hamming distance between a substring of $S$ and a substring of $T$. For a constant size alphabet strings, we show that for every $x\le nm$ there is a data structure with $\tilde{O}(nm/x)$ preprocess time and $O(x)$ query time. We also provide a combinatorial conditional lower bound, showing that for every $\varepsilon > 0$ and $x \le nm$ there is no data structure with query time $O(x)$ and preprocess time $O((\frac{nm}{x})^{1-\varepsilon})$ unless combinatorial fast matrix multiplication is possible. For strings over general alphabet, we present a data structure with $\tilde{O}(nm/\sqrt{x})$ preprocess time and $O(x)$ query time for every $x \le nm$.

Hamming Distance Oracle

TL;DR

This work introduces the Hamming Distance Oracle problem: given strings and , preprocess them to answer substring HD queries efficiently. It achieves a tunable preprocessing–query-time trade-off using a suffix-based dynamic programming approach, with a core dependence on the time to solve text-to-pattern HD queries, yielding preprocessing and query time for constant alphabets, and preprocessing with the same query time for general alphabets. A central contribution is showing a conditional lower bound under a combinatorial matrix-multiplication conjecture, ruling out faster combinatorial trade-offs for the binary case, and extending the lower bound to the Hamming Distance Oracle via a reduction from Boolean matrix multiplication. The results place near-optimal limits on preprocessing and query-time trade-offs for Hamming distance queries on substrings, and connect the problem to fundamental questions in combinatorial matrix multiplication.

Abstract

In this paper, we present and study the \emph{Hamming distance oracle problem}. In this problem, the task is to preprocess two strings and of lengths and , respectively, to obtain a data-structure that is able to answer queries regarding the Hamming distance between a substring of and a substring of . For a constant size alphabet strings, we show that for every there is a data structure with preprocess time and query time. We also provide a combinatorial conditional lower bound, showing that for every and there is no data structure with query time and preprocess time unless combinatorial fast matrix multiplication is possible. For strings over general alphabet, we present a data structure with preprocess time and query time for every .
Paper Structure (4 sections, 2 theorems, 1 figure)

This paper contains 4 sections, 2 theorems, 1 figure.

Key Result

Theorem 1

Fix $x\geq 1$. Given two strings $S, T$ over an alphabet $\Sigma$, such that $|S|=n$, $|T|=m$ and $m\leq n$. There exists a data structure for prob:SHDO with a preprocessing time of $O(\frac{n}{x}\cdot T_{\mathsf{HD}}(m,x,\Sigma))$, and a query time of $O(\min(m,x))$.

Figures (1)

  • Figure 1: A summary of our results for $n=m$. the $p$-axis corresponds to the exponent of the preprocess time and the $q$-axis corresponds to the exponent of the query time. For example, we have a general upper bound of $\tilde{O}(n^{1.75})$ preprocess time and $O(\sqrt{n})$ query time. Note that the lower bound is combinatorial.

Theorems & Definitions (3)

  • Theorem 1
  • Conjecture 2: Combinatorial Matrix Multiplication, see GU18
  • Theorem 3: Lower bound