Constant Rate Isometric Embeddings of Hamming Metric into Edit Metric
Sudatta Bhattacharya, Sanjana Dey, Elazar Goldenberg, Mursalin Habib, Bernhard Haeupler, Karthik C. S., Michal Koucký
TL;DR
This work resolves a central question in metric embeddings by constructing a constant-rate isometric embedding of the Hamming metric into the edit metric, achieving a rate of $1/8$ for binary strings via a novel framework built from misaligners and locally self-matching strings that draw on synchronization-string ideas. A key structural insight is that any isometric embedding must be interleaved, which enables robust upper bounds on the attainable rate (e.g., at most $15/32$ for binary alphabets) and a generalization to larger alphabets. The paper also shows that expanding the alphabet (and particularly allowing input and output alphabets to differ) can push the rate arbitrarily close to $1$, with asynchronous rate notions aligning to the information content in the output. These results yield immediate conditional hardness implications for edit-mmetric problems and illuminate fundamental trade-offs between rate, alphabet size, and isometry in Hamming-to-edit embeddings. Overall, the work advances both the theory and practical construction of high-rate embeddings and opens new directions for embedding strategies in string metrics.
Abstract
A function $\varphi: \{0,1\}^n \to \{0,1\}^N$ is called an isometric embedding of the $n$-dimensional Hamming metric space to the $N$-dimensional edit metric space if, for all $x, y \in \{0,1\}^n$, the Hamming distance between $x$ and $y$ is equal to the edit distance between $\varphi(x)$ and $\varphi(y)$. The rate of such an embedding is defined as the ratio $n/N$. It is well known in the literature how to construct isometric embeddings with a rate of $Ω(\frac{1}{\log n})$. However, achieving even near-isometric embeddings with a positive constant rate has remained elusive until now. In this paper, we present an isometric embedding with a rate of 1/8 by discovering connections to synchronization strings, which were studied in the context of insertion-deletion codes (Haeupler-Shahrasbi [JACM'21]). At a technical level, we introduce a framework for obtaining high-rate isometric embeddings using a novel object called misaligners. As an immediate consequence of our constant rate isometric embedding, we improve known conditional lower bounds for various optimization problems in the edit metric, but now with optimal dependency on the dimension. We complement our results by showing that no isometric embedding $\varphi:\{0, 1\}^n \to \{0, 1\}^N$ can have rate greater than 15/32 for all positive integers $n$. En route to proving this upper bound, we uncover fundamental structural properties necessary for every Hamming-to-edit isometric embedding. We also prove similar upper and lower bounds for embeddings over larger alphabets. Finally, we consider embeddings $\varphi:Σ_{\text{in}}^n\to Σ_{\text{out}}^N$ between different input and output alphabets, where the rate is given by $\frac{n\log|Σ_{\text{in}}|}{N\log|Σ_{\text{out}}|}$. In this setting, we show that the rate can be made arbitrarily close to 1.
