Size-constrained Weighted Ancestors with Applications

Philip Bille; Yakov Nekrich; Solon P. Pissis

Size-constrained Weighted Ancestors with Applications

Philip Bille, Yakov Nekrich, Solon P. Pissis

TL;DR

The paper defines the size-constrained weighted ancestor (SWA) problem on rooted trees, where node weights form a max-heap and are bounded by subtree size, as a generalization of the predecessor problem. It achieves an $\mathcal{O}(n)$-space, $\mathcal{O}(1)$-time SWA data structure with $\mathcal{O}(n)$ preprocessing, and further provides a linear-space variant via ART decomposition, together with multiple string-processing applications on suffix trees. The approach combines heavy-path decomposition, rank/select on per-path bitstrings, and fusion trees, then enhances it with ART-based tree contraction to reach linear space while preserving constant-time queries. These results yield efficient solutions for problems such as internal longest frequent prefix, longest frequent substring, and frequency-constrained substring complexity, with direct impact on suffix-tree based text indexing and search applications.

Abstract

The weighted ancestor problem on a rooted node-weighted tree $T$ is a generalization of the classic predecessor problem: construct a data structure for a set of integers that supports fast predecessor queries. Both problems are known to require $Ω(\log\log n)$ time for queries provided $\mathcal{O}(n\text{ poly} \log n)$ space is available, where $n$ is the input size. The weighted ancestor problem has attracted a lot of attention by the combinatorial pattern matching community due to its direct application to suffix trees. In this formulation of the problem, the nodes are weighted by string depth. This research has culminated in a data structure for weighted ancestors in suffix trees with $\mathcal{O}(1)$ query time and an $\mathcal{O}(n)$-time construction algorithm [Belazzougui et al., CPM 2021]. In this paper, we consider a different version of the weighted ancestor problem, where the nodes are weighted by any function $\textsf{weight}$ that maps the nodes of $T$ to positive integers, such that $\textsf{weight}(u)\le \textsf{size}(u)$ for any node $u$ and $\textsf{weight}(u_1)\le \textsf{weight}(u_2)$ if node $u_1$ is a descendant of node $u_2$, where $\textsf{size}(u)$ is the number of nodes in the subtree rooted at $u$. In the size-constrained weighted ancestor (SWA) problem, for any node $u$ of $T$ and any integer $k$, we are asked to return the lowest ancestor $w$ of $u$ with weight at least $k$. We show that for any rooted tree with $n$ nodes, we can locate node $w$ in $\mathcal{O}(1)$ time after $\mathcal{O}(n)$-time preprocessing. In particular, this implies a data structure for the SWA problem in suffix trees with $\mathcal{O}(1)$ query time and $\mathcal{O}(n)$-time preprocessing, when the nodes are weighted by $\textsf{weight}$. We also show several string-processing applications of this result.

Size-constrained Weighted Ancestors with Applications

TL;DR

-space,

-time SWA data structure with

preprocessing, and further provides a linear-space variant via ART decomposition, together with multiple string-processing applications on suffix trees. The approach combines heavy-path decomposition, rank/select on per-path bitstrings, and fusion trees, then enhances it with ART-based tree contraction to reach linear space while preserving constant-time queries. These results yield efficient solutions for problems such as internal longest frequent prefix, longest frequent substring, and frequency-constrained substring complexity, with direct impact on suffix-tree based text indexing and search applications.

Abstract

The weighted ancestor problem on a rooted node-weighted tree

is a generalization of the classic predecessor problem: construct a data structure for a set of integers that supports fast predecessor queries. Both problems are known to require

time for queries provided

space is available, where

is the input size. The weighted ancestor problem has attracted a lot of attention by the combinatorial pattern matching community due to its direct application to suffix trees. In this formulation of the problem, the nodes are weighted by string depth. This research has culminated in a data structure for weighted ancestors in suffix trees with

query time and an

-time construction algorithm [Belazzougui et al., CPM 2021]. In this paper, we consider a different version of the weighted ancestor problem, where the nodes are weighted by any function

that maps the nodes of

to positive integers, such that

for any node

and

if node

is a descendant of node

, where

is the number of nodes in the subtree rooted at

. In the size-constrained weighted ancestor (SWA) problem, for any node

and any integer

, we are asked to return the lowest ancestor

with weight at least

. We show that for any rooted tree with

nodes, we can locate node

time after

-time preprocessing. In particular, this implies a data structure for the SWA problem in suffix trees with

query time and

-time preprocessing, when the nodes are weighted by

. We also show several string-processing applications of this result.

Paper Structure (15 sections, 11 theorems, 1 equation, 5 figures)

This paper contains 15 sections, 11 theorems, 1 equation, 5 figures.

Introduction
Applications.
Preliminaries
Constant-time Queries using Space
Heavy-path Decomposition
Data Structure
Queries
Constant-time Queries using Space
ART Decomposition
Data Structure
Queries
String-processing Applications
Internal Longest Frequent Prefix
Longest Frequent Substring
Frequency-constrained Substring Complexity

Key Result

Theorem 1

For any rooted tree with $n$ nodes weighted by a size-constrained max-heap function $\mathsf{weight}$, there exists an $\mathcal{O}(n)$-space data structure answering $\mathsf{SWA}$ queries in $\mathcal{O}(1)$ time. The preprocessing algorithm runs in $\mathcal{O}(n)$ time and $\mathcal{O}(n)$ space

Figures (5)

Figure 1: Weighted ancestor queries on the suffix tree of string $X=\texttt{CAGAGA\$}$. The leaf nodes in both trees are labeled by the starting position of the suffix of $X$ they represent.
Figure 2: A rooted tree $T$ with $n=16$ nodes. Each node $u$ of $T$ is weighted by $\mathsf{weight}(u)= \mathsf{size}(u)$. For example, $\mathsf{weight}(u_5)=\mathsf{size}(u_5)=9$, because there are $9$ nodes in the subtree rooted at $u_5$, and $\mathsf{SWA}(u_2,7)=u_5$ because the lowest ancestor of $u_2$ with weight at least $7$ is node $u_5$. A heavy-path decomposition of $T$ is also depicted: the heavy edges are the red edges. For example, the heavy path of the whole $T$ is $u_1u_2\ldots u_6$.
Figure 3: The two cases of the querying algorithm.
Figure 4: The contraction process of the tree $T$ from Figure \ref{['fig:tree']}.
Figure 5: Application of ART decompositions on $C_T$.

Theorems & Definitions (14)

Theorem 1
Lemma 2: Rank and Select DBLP:conf/wads/BaumannH19
Lemma 3: Fusion Tree FW1993
Lemma 4: Heavy-path Decomposition DBLP:journals/jcss/SleatorT83
Example 5
Example 6
Lemma 7
Lemma 8: ART Decomposition AHR1998
Theorem 8
Theorem 9
...and 4 more

Size-constrained Weighted Ancestors with Applications

TL;DR

Abstract

Size-constrained Weighted Ancestors with Applications

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (5)

Theorems & Definitions (14)