Table of Contents
Fetching ...

Small-Space Algorithms for the Online Language Distance Problem for Palindromes and Squares

Gabriel Bathie, Tomasz Kociumaka, Tatiana Starikovskaya

TL;DR

This paper introduces online algorithms for the language distance problem with respect to PAL and SQ under both Hamming and edit distances in the low-distance regime. It develops randomized streaming methods using poly$(k,\log n)$ space and per-character time for all four problems, and complements them with deterministic read-only online algorithms that also run in poly$(k,\log n)$ resources. A key contribution is the use of self-similarity properties, Hamming-distance sketches, and prefix-based filtering to achieve near-optimal space-time trade-offs; for the LED variants, the work leverages locally consistent decompositions and RSLSLP-based schemata to obtain sublinear-space streaming and strong read-only guarantees. The results collectively advance small-space online solutions for PAL and SQ distance problems and provide foundational techniques for pattern-matching with errors and language-edit-distance computations in streaming and read-only settings.

Abstract

We study the online variant of the language distance problem for two classical formal languages, the language of palindromes and the language of squares, and for the two most fundamental distances, the Hamming distance and the edit (Levenshtein) distance. In this problem, defined for a fixed formal language $L$, we are given a string $T$ of length $n$, and the task is to compute the minimal distance to $L$ from every prefix of $T$. We focus on the low-distance regime, where one must compute only the distances smaller than a given threshold $k$. In this work, our contribution is twofold: - First, we show streaming algorithms, which access the input string $T$ only through a single left-to-right scan. Both for palindromes and squares, our algorithms use $O(k \cdot\mathrm{poly}~\log n)$ space and time per character in the Hamming-distance case and $O(k^2 \cdot\mathrm{poly}~\log n)$ space and time per character in the edit-distance case. These algorithms are randomised by necessity, and they err with probability inverse-polynomial in $n$. - Second, we show deterministic read-only online algorithms, which are also provided with read-only random access to the already processed characters of $T$. Both for palindromes and squares, our algorithms use $O(k \cdot\mathrm{poly}~\log n)$ space and time per character in the Hamming-distance case and $O(k^4 \cdot\mathrm{poly}~\log n)$ space and amortised time per character in the edit-distance case.

Small-Space Algorithms for the Online Language Distance Problem for Palindromes and Squares

TL;DR

This paper introduces online algorithms for the language distance problem with respect to PAL and SQ under both Hamming and edit distances in the low-distance regime. It develops randomized streaming methods using poly space and per-character time for all four problems, and complements them with deterministic read-only online algorithms that also run in poly resources. A key contribution is the use of self-similarity properties, Hamming-distance sketches, and prefix-based filtering to achieve near-optimal space-time trade-offs; for the LED variants, the work leverages locally consistent decompositions and RSLSLP-based schemata to obtain sublinear-space streaming and strong read-only guarantees. The results collectively advance small-space online solutions for PAL and SQ distance problems and provide foundational techniques for pattern-matching with errors and language-edit-distance computations in streaming and read-only settings.

Abstract

We study the online variant of the language distance problem for two classical formal languages, the language of palindromes and the language of squares, and for the two most fundamental distances, the Hamming distance and the edit (Levenshtein) distance. In this problem, defined for a fixed formal language , we are given a string of length , and the task is to compute the minimal distance to from every prefix of . We focus on the low-distance regime, where one must compute only the distances smaller than a given threshold . In this work, our contribution is twofold: - First, we show streaming algorithms, which access the input string only through a single left-to-right scan. Both for palindromes and squares, our algorithms use space and time per character in the Hamming-distance case and space and time per character in the edit-distance case. These algorithms are randomised by necessity, and they err with probability inverse-polynomial in . - Second, we show deterministic read-only online algorithms, which are also provided with read-only random access to the already processed characters of . Both for palindromes and squares, our algorithms use space and time per character in the Hamming-distance case and space and amortised time per character in the edit-distance case.
Paper Structure (26 sections, 25 theorems, 8 equations, 2 figures, 1 table)

This paper contains 26 sections, 25 theorems, 8 equations, 2 figures, 1 table.

Key Result

Theorem 4

There is a randomised streaming algorithm that solves the $k$-LHD-PAL problem for a string $T \in \Sigma^n$ using $O(k\log n)$ bits of space and $O(k\log^3 n)$ time per character. The algorithm errs with probability inverse-polynomial in $n$.

Figures (2)

  • Figure 1: Illustration of our filtering procedure. Here, $P'$ is a $k$-mismatch occurrence of $P_j$ at position $i + \ell_j$ in $T$ and position $p = i -\ell_j/2$ in $T_j$, reported with delay $\Delta_p = p-\ell_j/2$ in $T_j$, hence it arrives at time $2i$ in $T$.
  • Figure 2: Decomposition of $U = VW$.

Theorems & Definitions (39)

  • Theorem 4
  • Theorem 5
  • Example 7
  • Definition 8: clifford2019streaming
  • Corollary 9: of clifford2019streaming
  • Lemma 9
  • Claim 9
  • Proposition 9
  • Definition 10: charalampopoulos2020faster
  • Claim 11: From kociumaka2022small
  • ...and 29 more