Size-constrained Weighted Ancestors with Applications
Philip Bille, Yakov Nekrich, Solon P. Pissis
TL;DR
The paper defines the size-constrained weighted ancestor (SWA) problem on rooted trees, where node weights form a max-heap and are bounded by subtree size, as a generalization of the predecessor problem. It achieves an $\mathcal{O}(n)$-space, $\mathcal{O}(1)$-time SWA data structure with $\mathcal{O}(n)$ preprocessing, and further provides a linear-space variant via ART decomposition, together with multiple string-processing applications on suffix trees. The approach combines heavy-path decomposition, rank/select on per-path bitstrings, and fusion trees, then enhances it with ART-based tree contraction to reach linear space while preserving constant-time queries. These results yield efficient solutions for problems such as internal longest frequent prefix, longest frequent substring, and frequency-constrained substring complexity, with direct impact on suffix-tree based text indexing and search applications.
Abstract
The weighted ancestor problem on a rooted node-weighted tree $T$ is a generalization of the classic predecessor problem: construct a data structure for a set of integers that supports fast predecessor queries. Both problems are known to require $Ω(\log\log n)$ time for queries provided $\mathcal{O}(n\text{ poly} \log n)$ space is available, where $n$ is the input size. The weighted ancestor problem has attracted a lot of attention by the combinatorial pattern matching community due to its direct application to suffix trees. In this formulation of the problem, the nodes are weighted by string depth. This research has culminated in a data structure for weighted ancestors in suffix trees with $\mathcal{O}(1)$ query time and an $\mathcal{O}(n)$-time construction algorithm [Belazzougui et al., CPM 2021]. In this paper, we consider a different version of the weighted ancestor problem, where the nodes are weighted by any function $\textsf{weight}$ that maps the nodes of $T$ to positive integers, such that $\textsf{weight}(u)\le \textsf{size}(u)$ for any node $u$ and $\textsf{weight}(u_1)\le \textsf{weight}(u_2)$ if node $u_1$ is a descendant of node $u_2$, where $\textsf{size}(u)$ is the number of nodes in the subtree rooted at $u$. In the size-constrained weighted ancestor (SWA) problem, for any node $u$ of $T$ and any integer $k$, we are asked to return the lowest ancestor $w$ of $u$ with weight at least $k$. We show that for any rooted tree with $n$ nodes, we can locate node $w$ in $\mathcal{O}(1)$ time after $\mathcal{O}(n)$-time preprocessing. In particular, this implies a data structure for the SWA problem in suffix trees with $\mathcal{O}(1)$ query time and $\mathcal{O}(n)$-time preprocessing, when the nodes are weighted by $\textsf{weight}$. We also show several string-processing applications of this result.
