Table of Contents
Fetching ...

Size-constrained Weighted Ancestors with Applications

Philip Bille, Yakov Nekrich, Solon P. Pissis

TL;DR

The paper defines the size-constrained weighted ancestor (SWA) problem on rooted trees, where node weights form a max-heap and are bounded by subtree size, as a generalization of the predecessor problem. It achieves an $\mathcal{O}(n)$-space, $\mathcal{O}(1)$-time SWA data structure with $\mathcal{O}(n)$ preprocessing, and further provides a linear-space variant via ART decomposition, together with multiple string-processing applications on suffix trees. The approach combines heavy-path decomposition, rank/select on per-path bitstrings, and fusion trees, then enhances it with ART-based tree contraction to reach linear space while preserving constant-time queries. These results yield efficient solutions for problems such as internal longest frequent prefix, longest frequent substring, and frequency-constrained substring complexity, with direct impact on suffix-tree based text indexing and search applications.

Abstract

The weighted ancestor problem on a rooted node-weighted tree $T$ is a generalization of the classic predecessor problem: construct a data structure for a set of integers that supports fast predecessor queries. Both problems are known to require $Ω(\log\log n)$ time for queries provided $\mathcal{O}(n\text{ poly} \log n)$ space is available, where $n$ is the input size. The weighted ancestor problem has attracted a lot of attention by the combinatorial pattern matching community due to its direct application to suffix trees. In this formulation of the problem, the nodes are weighted by string depth. This research has culminated in a data structure for weighted ancestors in suffix trees with $\mathcal{O}(1)$ query time and an $\mathcal{O}(n)$-time construction algorithm [Belazzougui et al., CPM 2021]. In this paper, we consider a different version of the weighted ancestor problem, where the nodes are weighted by any function $\textsf{weight}$ that maps the nodes of $T$ to positive integers, such that $\textsf{weight}(u)\le \textsf{size}(u)$ for any node $u$ and $\textsf{weight}(u_1)\le \textsf{weight}(u_2)$ if node $u_1$ is a descendant of node $u_2$, where $\textsf{size}(u)$ is the number of nodes in the subtree rooted at $u$. In the size-constrained weighted ancestor (SWA) problem, for any node $u$ of $T$ and any integer $k$, we are asked to return the lowest ancestor $w$ of $u$ with weight at least $k$. We show that for any rooted tree with $n$ nodes, we can locate node $w$ in $\mathcal{O}(1)$ time after $\mathcal{O}(n)$-time preprocessing. In particular, this implies a data structure for the SWA problem in suffix trees with $\mathcal{O}(1)$ query time and $\mathcal{O}(n)$-time preprocessing, when the nodes are weighted by $\textsf{weight}$. We also show several string-processing applications of this result.

Size-constrained Weighted Ancestors with Applications

TL;DR

The paper defines the size-constrained weighted ancestor (SWA) problem on rooted trees, where node weights form a max-heap and are bounded by subtree size, as a generalization of the predecessor problem. It achieves an -space, -time SWA data structure with preprocessing, and further provides a linear-space variant via ART decomposition, together with multiple string-processing applications on suffix trees. The approach combines heavy-path decomposition, rank/select on per-path bitstrings, and fusion trees, then enhances it with ART-based tree contraction to reach linear space while preserving constant-time queries. These results yield efficient solutions for problems such as internal longest frequent prefix, longest frequent substring, and frequency-constrained substring complexity, with direct impact on suffix-tree based text indexing and search applications.

Abstract

The weighted ancestor problem on a rooted node-weighted tree is a generalization of the classic predecessor problem: construct a data structure for a set of integers that supports fast predecessor queries. Both problems are known to require time for queries provided space is available, where is the input size. The weighted ancestor problem has attracted a lot of attention by the combinatorial pattern matching community due to its direct application to suffix trees. In this formulation of the problem, the nodes are weighted by string depth. This research has culminated in a data structure for weighted ancestors in suffix trees with query time and an -time construction algorithm [Belazzougui et al., CPM 2021]. In this paper, we consider a different version of the weighted ancestor problem, where the nodes are weighted by any function that maps the nodes of to positive integers, such that for any node and if node is a descendant of node , where is the number of nodes in the subtree rooted at . In the size-constrained weighted ancestor (SWA) problem, for any node of and any integer , we are asked to return the lowest ancestor of with weight at least . We show that for any rooted tree with nodes, we can locate node in time after -time preprocessing. In particular, this implies a data structure for the SWA problem in suffix trees with query time and -time preprocessing, when the nodes are weighted by . We also show several string-processing applications of this result.
Paper Structure (15 sections, 11 theorems, 1 equation, 5 figures)

This paper contains 15 sections, 11 theorems, 1 equation, 5 figures.

Key Result

Theorem 1

For any rooted tree with $n$ nodes weighted by a size-constrained max-heap function $\mathsf{weight}$, there exists an $\mathcal{O}(n)$-space data structure answering $\mathsf{SWA}$ queries in $\mathcal{O}(1)$ time. The preprocessing algorithm runs in $\mathcal{O}(n)$ time and $\mathcal{O}(n)$ space

Figures (5)

  • Figure 1: Weighted ancestor queries on the suffix tree of string $X=\texttt{CAGAGA\$}$. The leaf nodes in both trees are labeled by the starting position of the suffix of $X$ they represent.
  • Figure 2: A rooted tree $T$ with $n=16$ nodes. Each node $u$ of $T$ is weighted by $\mathsf{weight}(u)= \mathsf{size}(u)$. For example, $\mathsf{weight}(u_5)=\mathsf{size}(u_5)=9$, because there are $9$ nodes in the subtree rooted at $u_5$, and $\mathsf{SWA}(u_2,7)=u_5$ because the lowest ancestor of $u_2$ with weight at least $7$ is node $u_5$. A heavy-path decomposition of $T$ is also depicted: the heavy edges are the red edges. For example, the heavy path of the whole $T$ is $u_1u_2\ldots u_6$.
  • Figure 3: The two cases of the querying algorithm.
  • Figure 4: The contraction process of the tree $T$ from Figure \ref{['fig:tree']}.
  • Figure 5: Application of ART decompositions on $C_T$.

Theorems & Definitions (14)

  • Theorem 1
  • Lemma 2: Rank and Select DBLP:conf/wads/BaumannH19
  • Lemma 3: Fusion Tree FW1993
  • Lemma 4: Heavy-path Decomposition DBLP:journals/jcss/SleatorT83
  • Example 5
  • Example 6
  • Lemma 7
  • Lemma 8: ART Decomposition AHR1998
  • Theorem 8
  • Theorem 9
  • ...and 4 more