Table of Contents
Fetching ...

Optimal-Time Move Structure Balancing and LCP Array Computation from the RLBWT

Nathaniel K. Brown, Ahsan Sanaullah, Shaojie Zhang, Ben Langmead

Abstract

On repetitive text collections of size $n$, the Burrows-Wheeler Transform (BWT) tends to have relatively fewer runs $r$ in its run-length encoded BWT (RLBWT). This motivates many RLBWT-related algorithms and data structures that can be designed in compressed $O(r)$-space. These approaches often use the RLBWT-derived permutations LF, FL, $φ$, and $φ^{-1}$, which can be represented using a move structure to obtain optimal $O(1)$-time for each permutation step in $O(r)$-space. They are then used to construct compressed space text indexes supporting efficient pattern matching queries. However, move structure construction in $O(r)$-space requires an $O(r \log r)$-time balancing stage. The longest common prefix array (LCP) of a text collection is used to support pattern matching queries and data structure construction. Recently, it was shown how to compute the LCP array in $O(n + r \log r)$-time and $O(r)$ additional space from an RLBWT. However, the bottleneck remains the $O(r \log r)$-time move structure balancing stage. In this paper, we describe an optimal $O(r)$-time and space algorithm to balance a move structure. This result is then applied to LCP construction from an RLBWT to obtain an optimal $O(n)$-time algorithm in $O(r)$-space in addition to the output, which implies an optimal-time algorithm for LCP array enumeration in compressed $O(r)$-space.

Optimal-Time Move Structure Balancing and LCP Array Computation from the RLBWT

Abstract

On repetitive text collections of size , the Burrows-Wheeler Transform (BWT) tends to have relatively fewer runs in its run-length encoded BWT (RLBWT). This motivates many RLBWT-related algorithms and data structures that can be designed in compressed -space. These approaches often use the RLBWT-derived permutations LF, FL, , and , which can be represented using a move structure to obtain optimal -time for each permutation step in -space. They are then used to construct compressed space text indexes supporting efficient pattern matching queries. However, move structure construction in -space requires an -time balancing stage. The longest common prefix array (LCP) of a text collection is used to support pattern matching queries and data structure construction. Recently, it was shown how to compute the LCP array in -time and additional space from an RLBWT. However, the bottleneck remains the -time move structure balancing stage. In this paper, we describe an optimal -time and space algorithm to balance a move structure. This result is then applied to LCP construction from an RLBWT to obtain an optimal -time algorithm in -space in addition to the output, which implies an optimal-time algorithm for LCP array enumeration in compressed -space.
Paper Structure (21 sections, 25 theorems, 1 equation)

This paper contains 21 sections, 25 theorems, 1 equation.

Key Result

Corollary 1

Let $I = \{\, i \in [0,n) \mid \mathrm{PLCP}\xspace[i] \text{ is irreducible}\,\}$. Then where $j=I.rank(i)$, $\mathrm{PLCP}\xspace[i]=\mathrm{PLCP}^{+}\xspace[j] - (i - I[j])$

Theorems & Definitions (36)

  • Corollary 1
  • Definition 2
  • Definition 3
  • Definition 4
  • Definition 5
  • Theorem 6: Nishimoto and Tabei nishimoto2021optimal; Brown, Gagie, and Rossi brown2022rlbwt
  • Definition 7
  • Definition 8
  • Definition 9
  • Definition 10
  • ...and 26 more