Table of Contents
Fetching ...

Constant Rate Isometric Embeddings of Hamming Metric into Edit Metric

Sudatta Bhattacharya, Sanjana Dey, Elazar Goldenberg, Mursalin Habib, Bernhard Haeupler, Karthik C. S., Michal Koucký

TL;DR

This work resolves a central question in metric embeddings by constructing a constant-rate isometric embedding of the Hamming metric into the edit metric, achieving a rate of $1/8$ for binary strings via a novel framework built from misaligners and locally self-matching strings that draw on synchronization-string ideas. A key structural insight is that any isometric embedding must be interleaved, which enables robust upper bounds on the attainable rate (e.g., at most $15/32$ for binary alphabets) and a generalization to larger alphabets. The paper also shows that expanding the alphabet (and particularly allowing input and output alphabets to differ) can push the rate arbitrarily close to $1$, with asynchronous rate notions aligning to the information content in the output. These results yield immediate conditional hardness implications for edit-mmetric problems and illuminate fundamental trade-offs between rate, alphabet size, and isometry in Hamming-to-edit embeddings. Overall, the work advances both the theory and practical construction of high-rate embeddings and opens new directions for embedding strategies in string metrics.

Abstract

A function $\varphi: \{0,1\}^n \to \{0,1\}^N$ is called an isometric embedding of the $n$-dimensional Hamming metric space to the $N$-dimensional edit metric space if, for all $x, y \in \{0,1\}^n$, the Hamming distance between $x$ and $y$ is equal to the edit distance between $\varphi(x)$ and $\varphi(y)$. The rate of such an embedding is defined as the ratio $n/N$. It is well known in the literature how to construct isometric embeddings with a rate of $Ω(\frac{1}{\log n})$. However, achieving even near-isometric embeddings with a positive constant rate has remained elusive until now. In this paper, we present an isometric embedding with a rate of 1/8 by discovering connections to synchronization strings, which were studied in the context of insertion-deletion codes (Haeupler-Shahrasbi [JACM'21]). At a technical level, we introduce a framework for obtaining high-rate isometric embeddings using a novel object called misaligners. As an immediate consequence of our constant rate isometric embedding, we improve known conditional lower bounds for various optimization problems in the edit metric, but now with optimal dependency on the dimension. We complement our results by showing that no isometric embedding $\varphi:\{0, 1\}^n \to \{0, 1\}^N$ can have rate greater than 15/32 for all positive integers $n$. En route to proving this upper bound, we uncover fundamental structural properties necessary for every Hamming-to-edit isometric embedding. We also prove similar upper and lower bounds for embeddings over larger alphabets. Finally, we consider embeddings $\varphi:Σ_{\text{in}}^n\to Σ_{\text{out}}^N$ between different input and output alphabets, where the rate is given by $\frac{n\log|Σ_{\text{in}}|}{N\log|Σ_{\text{out}}|}$. In this setting, we show that the rate can be made arbitrarily close to 1.

Constant Rate Isometric Embeddings of Hamming Metric into Edit Metric

TL;DR

This work resolves a central question in metric embeddings by constructing a constant-rate isometric embedding of the Hamming metric into the edit metric, achieving a rate of for binary strings via a novel framework built from misaligners and locally self-matching strings that draw on synchronization-string ideas. A key structural insight is that any isometric embedding must be interleaved, which enables robust upper bounds on the attainable rate (e.g., at most for binary alphabets) and a generalization to larger alphabets. The paper also shows that expanding the alphabet (and particularly allowing input and output alphabets to differ) can push the rate arbitrarily close to , with asynchronous rate notions aligning to the information content in the output. These results yield immediate conditional hardness implications for edit-mmetric problems and illuminate fundamental trade-offs between rate, alphabet size, and isometry in Hamming-to-edit embeddings. Overall, the work advances both the theory and practical construction of high-rate embeddings and opens new directions for embedding strategies in string metrics.

Abstract

A function is called an isometric embedding of the -dimensional Hamming metric space to the -dimensional edit metric space if, for all , the Hamming distance between and is equal to the edit distance between and . The rate of such an embedding is defined as the ratio . It is well known in the literature how to construct isometric embeddings with a rate of . However, achieving even near-isometric embeddings with a positive constant rate has remained elusive until now. In this paper, we present an isometric embedding with a rate of 1/8 by discovering connections to synchronization strings, which were studied in the context of insertion-deletion codes (Haeupler-Shahrasbi [JACM'21]). At a technical level, we introduce a framework for obtaining high-rate isometric embeddings using a novel object called misaligners. As an immediate consequence of our constant rate isometric embedding, we improve known conditional lower bounds for various optimization problems in the edit metric, but now with optimal dependency on the dimension. We complement our results by showing that no isometric embedding can have rate greater than 15/32 for all positive integers . En route to proving this upper bound, we uncover fundamental structural properties necessary for every Hamming-to-edit isometric embedding. We also prove similar upper and lower bounds for embeddings over larger alphabets. Finally, we consider embeddings between different input and output alphabets, where the rate is given by . In this setting, we show that the rate can be made arbitrarily close to 1.

Paper Structure

This paper contains 35 sections, 30 theorems, 69 equations, 11 figures.

Key Result

Theorem 1.1

There exists a universal constant $C\ge 1$ such that for every positive integer $n$, there is an isometric embedding $\varphi_{n}:\{0,1\}^n\to\{0,1\}^{Cn}$ of the Hamming metric into the edit metric.

Figures (11)

  • Figure 1: An optimal alignment converting the embedded string $X$ to the embedded string $Y$. Note the alternating maximal nowhere-vertical and vertical intervals (highlighted blue and red, respectively) $I_1, I_2, I_3, I_4$ and $I_5$.
  • Figure 2: A nowhere-vertical alignment on some interval in the embedded strings $X$ and $Y$ implies a self-alignment of a substring $S$ of the $\varepsilon$-synchronization string with the same cost.
  • Figure 3: For short intervals, the misaligner guarantees isometry: any nowhere-vertical edit alignment must pay at least as much as the Hamming distance, no matter how the wildcards are instantiated.
  • Figure 4: The blocks $b_i$ and $b'_{i'}$ as well as $b_j$ and $b'_{j'}$ and $b_k$ and $b'_{k'}$ are instantiations of the same codeword and appear in different positions in $X$ and $Y$. The bad blocks --- those where all characters are matched vertically --- are highlighted in red.
  • Figure 5: The alignment transforms every block $c$ into a substring $s$ arising from the concatenation of multiple blocks. If $c$ and $s$ are roughly the same size, the misaligner guarantees the relative edit distance between $c$ and $s$ is at least 0.2. Observe that in case (b), the alignment has no vertical edges between $b$ and its counterpart in $s$.
  • ...and 6 more figures

Theorems & Definitions (70)

  • Theorem 1.1
  • Theorem 1.2
  • Theorem 1.3
  • Theorem 1.4
  • Theorem 1.5: Informal statement of Theorem \ref{['thm:main']}
  • Corollary 1.6
  • Theorem 1.7
  • Corollary 1.8
  • Theorem 1.9: Isometry implies Interleaving
  • Theorem 1.10
  • ...and 60 more