Space-efficient SLP encoding for $O(\log N)$-time random access
Akito Takasaka, Tomohiro I
TL;DR
The paper tackles the problem of random access on grammar-compressed strings represented by Straight-Line Programs (SLPs). It introduces a novel space-efficient encoding framework that leverages symmetric centroid decomposition (SC-paths) of the SLP DAG and compacted binary tries to support interval-biased searches, enabling worst-case substring extraction in $O(\log N + q - p)$ time. It provides three encodings with distinct space bounds: (I) $n \lceil \lg N \rceil + (n + n') \lceil \lg (n+\sigma) \rceil + 4n - 2n' + o(n)$ bits, (II) $n \lceil \lg N \rceil + n \lceil \lg (n+\sigma) \rceil + 5n + n' + o(n)$ bits, and (III) $n \lceil \lg N \rceil + n \lceil \lg (n+\sigma) \rceil + 5n - n' + \sigma + o(n+\sigma)$ bits. These encodings achieve near-optimal time bounds for random access, matching known lower bounds up to constant factors, and significantly reduce space compared to naïve encodings. The work advances practical grammar-compressed data structures by delivering explicit, provably compact encodings with rigorous time guarantees for substring extraction on compressed strings.
Abstract
A Straight-Line Program (SLP) $G$ for a string $T$ is a context-free grammar (CFG) that derives $T$ only, which can be considered as a compressed representation of $T$. In this paper, we show how to encode $G$ in $n \lceil \lg N \rceil + (n + n') \lceil \lg (n+σ) \rceil + 4n - 2n' + o(n)$ bits to support random access queries of extracting $T[p..q]$ in worst-case $O(\log N + q - p)$ time, where $N$ is the length of $T$, $σ$ is the alphabet size, $n$ is the number of variables in $G$ and $n' \le n$ is the number of symmetric centroid paths in the DAG representation for $G$. The time complexity is almost optimal because Verbin and Yu [CPM 2013] proved that $O(\log N)$ term cannot be significantly improved in general with $\mathrm{poly}(n)$-space data structures. We also present alternative encodings that achieve the same random access time with $n \lceil \lg N \rceil + n \lceil \lg (n+σ) \rceil + 5n + n' + o(n)$ or $n \lceil \lg N \rceil + n \lceil \lg (n+σ) \rceil + 5n - n' + σ+ o(n+σ)$ bits of space.
