Table of Contents
Fetching ...

Online and Offline Algorithms for Counting Distinct Closed Factors via Sliding Suffix Trees

Takuya Mieno, Shun Takahashi, Kazuhisa Seto, Takashi Horiyama

TL;DR

This work addresses counting distinct closed factors in a string via sliding suffix trees, presenting both online and offline approaches. The online method achieves $O(n\log\sigma)$ time using $O(n)$ space by leveraging Ukkonen's suffix tree and sliding-window structures, while the offline method attains $O(n)$ time and space for linearly sortable alphabets by simulating sliding trees on a static suffix tree with WAQ. A border-based characterization using $t_j=\mathrm{lrs}(T[1..j])$ and $z_j=\mathrm{lrs}^2(T[1..j])$ underpins the counting, enabling linear-time offline processing and a pathway to enumeration via geometric range data structures. The paper also explores enumeration trade-offs, showing subquadratic enumeration under certain conditions but leaving an open question whether $O(n\mathrm{polylog}(n) + \mathrm{output})$ time can be achieved, i.e., linear in the output size up to polylog factors. These results advance efficient analysis of repetitive structures in strings and contribute techniques for sliding-window string processing.

Abstract

A string is said to be closed if its length is one, or if it has a non-empty factor that occurs both as a prefix and as a suffix of the string, but does not occur elsewhere. The notion of closed words was introduced by [Fici, WORDS 2011]. Recently, the maximum number of distinct closed factors occurring in a string was investigated by [Parshina and Puzynina, Theor. Comput. Sci. 2024], and an asymptotic tight bound was proved. In this paper, we propose two algorithms to count the distinct closed factors in a string T of length n over an alphabet of size σ. The first algorithm runs in O(n log σ) time using O(n) space for string T given in an online manner. The second algorithm runs in O(n) time using O(n) space for string T given in an offline manner. Both algorithms utilize suffix trees for sliding windows.

Online and Offline Algorithms for Counting Distinct Closed Factors via Sliding Suffix Trees

TL;DR

This work addresses counting distinct closed factors in a string via sliding suffix trees, presenting both online and offline approaches. The online method achieves time using space by leveraging Ukkonen's suffix tree and sliding-window structures, while the offline method attains time and space for linearly sortable alphabets by simulating sliding trees on a static suffix tree with WAQ. A border-based characterization using and underpins the counting, enabling linear-time offline processing and a pathway to enumeration via geometric range data structures. The paper also explores enumeration trade-offs, showing subquadratic enumeration under certain conditions but leaving an open question whether time can be achieved, i.e., linear in the output size up to polylog factors. These results advance efficient analysis of repetitive structures in strings and contribute techniques for sliding-window string processing.

Abstract

A string is said to be closed if its length is one, or if it has a non-empty factor that occurs both as a prefix and as a suffix of the string, but does not occur elsewhere. The notion of closed words was introduced by [Fici, WORDS 2011]. Recently, the maximum number of distinct closed factors occurring in a string was investigated by [Parshina and Puzynina, Theor. Comput. Sci. 2024], and an asymptotic tight bound was proved. In this paper, we propose two algorithms to count the distinct closed factors in a string T of length n over an alphabet of size σ. The first algorithm runs in O(n log σ) time using O(n) space for string T given in an online manner. The second algorithm runs in O(n) time using O(n) space for string T given in an offline manner. Both algorithms utilize suffix trees for sliding windows.
Paper Structure (11 sections, 9 theorems, 1 equation, 2 figures)

This paper contains 11 sections, 9 theorems, 1 equation, 2 figures.

Key Result

Theorem 1

For incremental $i = 1, 2, \ldots, n$, we can maintain the suffix tree of $T[1..i]$ and the length of $\mathit{lrs}(T[1.. i])$ in a total of $O(n \log \sigma)$ time.

Figures (2)

  • Figure 1: The suffix tree of string $T = \mathtt{babcab}$. Each leaf of the tree represents a suffix of $T$, with the integer inside each leaf indicating the starting position of the suffix. Namely, the leaf labeled with number $i$ corresponds to $\mathsf{leaf}_T(i)$. In this suffix tree, $\mathsf{str}_T(u) = \mathtt{b}$, $\mathsf{strlen}_T(u) = 1$, $\mathsf{str}_T(\mathsf{leaf}_T(3)) = \mathtt{bcab}$, and $\mathsf{strlen}_T(\mathsf{leaf}_T(3)) = 4$. The star symbol indicates the locus of the active point, which represents the longest repeating suffix $\mathtt{ab}$ of $T$. The dotted arrows represent suffix links.
  • Figure 2: The suffix tree of string $T = \mathtt{ababcabcac\$}$ is shown on the left. The suffix tree of $T[2.. 7] = \mathtt{babcab}$, which is the same as the one in Fig. \ref{['fig:stree']}, is shown on the right. Suffix links are omitted in this figure. In the tree on the left, each node and edge enclosed in a bold line is connected to a corresponding one in the tree on the right. For clarity, those connections are not drawn.

Theorems & Definitions (10)

  • Theorem 1: Ukkonen95
  • Theorem 2: FialaG89Larsson96Senft2005LeonardarXiv
  • Lemma 1
  • proof
  • Corollary 1
  • Theorem 3
  • Theorem 4
  • Corollary 2
  • Proposition 1
  • Proposition 2