Table of Contents
Fetching ...

Approximation Schemes for Edit Distance and LCS in Quasi-Strongly Subquadratic Time

Xiao Mao, Aviad Rubinstein

Abstract

We present novel randomized approximation schemes for the Edit Distance (ED) problem and the Longest Common Subsequence (LCS) problem that, for any constant $ε>0$, compute a $(1+ε)$-approximation for ED and a $(1-ε)$-approximation for LCS in time $n^2 / 2^{\log^{Ω(1)}(n)}$ for two strings of total length at most $n$. This running time improves upon the classical quadratic-time dynamic programming algorithms by a quasi-polynomial factor. Our results yield significant insights into fine-grained complexity: Firstly, for ED, prior work indicates that any exact algorithm cannot be improved beyond a few logarithmic factors without refuting established complexity assumptions [Abboud, Hansen, Vassilevska Williams, Williams, 2016]; our quasi-polynomial speed-up shows a separation the complexity of approximate ED from that of exact ED, even for approximation factor arbitrarily close to $1$. Secondly, for LCS, obtaining similar approximation-time tradeoffs via deterministic algorithms would imply breakthrough circuit lower bounds [Chen, Goldwasser, Lyu, Rothblum, Rubinstein, 2019]; our randomized algorithm demonstrates derandomization hardness for LCS approximation.

Approximation Schemes for Edit Distance and LCS in Quasi-Strongly Subquadratic Time

Abstract

We present novel randomized approximation schemes for the Edit Distance (ED) problem and the Longest Common Subsequence (LCS) problem that, for any constant , compute a -approximation for ED and a -approximation for LCS in time for two strings of total length at most . This running time improves upon the classical quadratic-time dynamic programming algorithms by a quasi-polynomial factor. Our results yield significant insights into fine-grained complexity: Firstly, for ED, prior work indicates that any exact algorithm cannot be improved beyond a few logarithmic factors without refuting established complexity assumptions [Abboud, Hansen, Vassilevska Williams, Williams, 2016]; our quasi-polynomial speed-up shows a separation the complexity of approximate ED from that of exact ED, even for approximation factor arbitrarily close to . Secondly, for LCS, obtaining similar approximation-time tradeoffs via deterministic algorithms would imply breakthrough circuit lower bounds [Chen, Goldwasser, Lyu, Rothblum, Rubinstein, 2019]; our randomized algorithm demonstrates derandomization hardness for LCS approximation.

Paper Structure

This paper contains 59 sections, 44 theorems, 171 equations, 6 figures, 2 tables, 3 algorithms.

Key Result

Lemma 3.1

Let $X = (x_1, \ldots, x_N)$ be $N$ independent random variables in $[-c, c]$. Then, for all $\delta > 0$, where $\mu = \operatorname*{\mathbf{E}}\left[ \sum_{i = 1}^{N} x_i \right]$.

Figures (6)

  • Figure 1: Edit distance as a shortest path on 2D grid (standard)
  • Figure 2: Convenient coordinate system
  • Figure 3: Sub-sampling
  • Figure 4: Overfitting Issue With Naïve Sub-sampling Approach for ED
  • Figure 5: In this example, the branching factor is $M = 4$. Naïvely, all three paths are candidate paths. However, in our approach we only consider as a valid candidate the middle path, which is rounded to the straight line from A to E on "anchors" B, C, and D.
  • ...and 1 more figures

Theorems & Definitions (107)

  • Lemma 3.1
  • Lemma 3.2
  • Lemma 3.3
  • Lemma 3.4
  • Lemma 4.0
  • Claim 4.1
  • proof
  • proof : Proof of \ref{['lemma:totalmeandeviation']}
  • Lemma 4.2
  • proof
  • ...and 97 more