Table of Contents
Fetching ...

Internal Pattern Matching in Small Space and Applications

Gabriel Bathie, Panagiotis Charalampopoulos, Tatiana Starikovskaya

TL;DR

This work studies IPM, a core internal-query primitive, in a read-only small-space model and delivers a near-optimal space–time trade-off: a data structure of size $\tilde{O}(n/\tau)$ with $\tilde{O}(n)$ preprocessing time answers IPM queries in $O(\tau + \log n \log^3 \log n)$. The construction relies on $\tau$-partitioning anchors, sparse suffix trees, and 3D range searching to locate anchored occurrences efficiently. The authors connect IPM to broader problems, enabling space-efficient treatments of LCS and CPM, and provide both lower bounds and streaming/asymmetric-streaming results that illuminate the limits and capabilities of small-space string processing. They also show a constant-space CPM solution in the read-only setting for certain input sizes and conditions, highlighting practical implications for read-only, space-constrained environments.

Abstract

In this work, we consider pattern matching variants in small space, that is, in the read-only setting, where we want to bound the space usage on top of storing the strings. Our main contribution is a space-time trade-off for the Internal Pattern Matching (IPM) problem, where the goal is to construct a data structure over a string $S$ of length $n$ that allows one to answer the following type of queries: Compute the occurrences of a fragment $P$ of $S$ inside another fragment $T$ of $S$, provided that $|T| < 2|P|$. For any $τ\in [1 .. n/\log^2 n]$, we present a nearly-optimal $Õ(n/τ)$-size data structure that can be built in $Õ(n)$ time using $Õ(n/τ)$ extra space, and answers IPM queries in $O(τ+\log n \log^3 \log n)$ time. IPM queries have been identified as a crucial primitive operation for the analysis of algorithms on strings. In particular, the complexities of several recent algorithms for approximate pattern matching are expressed with regards to the number of calls to a small set of primitive operations that include IPM queries; our data structure allows us to port these results to the small-space setting. We further showcase the applicability of our IPM data structure by using it to obtain space-time trade-offs for the longest common substring and circular pattern matching problems in the asymmetric streaming setting.

Internal Pattern Matching in Small Space and Applications

TL;DR

This work studies IPM, a core internal-query primitive, in a read-only small-space model and delivers a near-optimal space–time trade-off: a data structure of size with preprocessing time answers IPM queries in . The construction relies on -partitioning anchors, sparse suffix trees, and 3D range searching to locate anchored occurrences efficiently. The authors connect IPM to broader problems, enabling space-efficient treatments of LCS and CPM, and provide both lower bounds and streaming/asymmetric-streaming results that illuminate the limits and capabilities of small-space string processing. They also show a constant-space CPM solution in the read-only setting for certain input sizes and conditions, highlighting practical implications for read-only, space-constrained environments.

Abstract

In this work, we consider pattern matching variants in small space, that is, in the read-only setting, where we want to bound the space usage on top of storing the strings. Our main contribution is a space-time trade-off for the Internal Pattern Matching (IPM) problem, where the goal is to construct a data structure over a string of length that allows one to answer the following type of queries: Compute the occurrences of a fragment of inside another fragment of , provided that . For any , we present a nearly-optimal -size data structure that can be built in time using extra space, and answers IPM queries in time. IPM queries have been identified as a crucial primitive operation for the analysis of algorithms on strings. In particular, the complexities of several recent algorithms for approximate pattern matching are expressed with regards to the number of calls to a small set of primitive operations that include IPM queries; our data structure allows us to port these results to the small-space setting. We further showcase the applicability of our IPM data structure by using it to obtain space-time trade-offs for the longest common substring and circular pattern matching problems in the asymmetric streaming setting.
Paper Structure (11 sections, 16 theorems)

This paper contains 11 sections, 16 theorems.

Key Result

Corollary 0

Suppose that we have read-only random access to a $n$-length string $S$ of length $n$ over an integer alphabet. For any integer $\tau = O(n / \log^2 n)$, there is a data structure that can be built using $O(n\log_{n/\tau} n + (n/\tau) \cdot \log^4 n \log \log n)$ time using $O((n/\tau) \cdot \log n

Theorems & Definitions (19)

  • Corollary 0
  • Remark 10
  • Lemma 11
  • Definition 12: $\tau$-partitioning set
  • Theorem 13: DBLP:journals/corr/abs-2105-03782
  • Definition 14: $\tau$-runs
  • Lemma 16
  • Theorem 17
  • Corollary 17
  • Lemma 18
  • ...and 9 more