Construction of Sparse Suffix Trees and LCE Indexes in Optimal Time and Space
Dmitry Kosolobov, Nikita Sivukhin
TL;DR
The paper addresses efficient construction of small-space string indexes, namely sparse suffix trees and LCE indexes, for readonly strings by leveraging a deterministic locally consistent parsing framework built on $\\tau$-partitioning sets. It combines a Cole–Vishkin style partitioning method with Je\\'z recompression to produce $\\tau$-partitioning sets of size $O(b)$ with $\\tau = n/b$, enabling an $O(b)$ space overhead on top of the input and near-linear construction time; for $b \\ge n^{\\varepsilon}$ this yields linear-time construction. The core results show that, for $\\tau$ in a broad range (including $\\tau \\ge 4$ up to $O(n/\\log^2 n)$), one can deterministically construct SSTs and LCE indexes in $\\mathcal{O}(n \\log_b n)$ time using $O(b)$ space, which subsumes and improves prior small-space deterministic constructions. The approach supports linear-time, linear-space ends for $b \\ge n^{\\varepsilon}$ and provides a general framework with tight time-space trade-offs, contributing a practical path toward optimal small-space string indexing under readonly inputs.
Abstract
The notions of synchronizing and partitioning sets are recently introduced variants of locally consistent parsings with great potential in problem-solving. In this paper we propose a deterministic algorithm that constructs for a given readonly string of length $n$ over the alphabet $\{0,1,\ldots,n^{\mathcal{O}(1)}\}$ a variant of $τ$-partitioning set with size $\mathcal{O}(b)$ and $τ= \frac{n}{b}$ using $\mathcal{O}(b)$ space and $\mathcal{O}(\frac{1}εn)$ time provided $b \ge n^ε$, for $ε> 0$. As a corollary, for $b \ge n^ε$ and constant $ε> 0$, we obtain linear construction algorithms with $\mathcal{O}(b)$ space on top of the string for two major small-space indexes: a sparse suffix tree, which is a compacted trie built on $b$ chosen suffixes of the string, and a longest common extension (LCE) index, which occupies $\mathcal{O}(b)$ space and allows us to compute the longest common prefix for any pair of substrings in $\mathcal{O}(n/b)$ time. For both, the $\mathcal{O}(b)$ construction storage is asymptotically optimal since the tree itself takes $\mathcal{O}(b)$ space and any LCE index with $\mathcal{O}(n/b)$ query time must occupy at least $\mathcal{O}(b)$ space by a known trade-off (at least for $b \ge Ω(n / \log n)$). In case of arbitrary $b \ge Ω(\log^2 n)$, we present construction algorithms for the partitioning set, sparse suffix tree, and LCE index with $\mathcal{O}(n\log_b n)$ running time and $\mathcal{O}(b)$ space, thus also improving the state of the art.
