Table of Contents
Fetching ...

Inverting Parameterized Burrows-Wheeler Transform

Shogen Kawanami, Kento Iseri, Tomohiro I

TL;DR

This paper proves that the parameterized Burrows-Wheeler Transform (pBWT) is invertible: from the pBWT $\mathsf{L}$ of a $p$-string $\mathsf{T}$ of length $n$, one can recover $\mathsf{T}$ up to renaming of parameter symbols in $O(n^2)$ time using $O(n)$ space. The authors formalize parameterized strings via s- and p-symbols, introduce prev-encoding to capture $p$-matching, and define the pBWT together with the LF-mapping. They present a constructive inversion algorithm: a first $O(n^3)$-time, $O(n^2)$-space approach using prefix encodings, then an optimized $O(n^2)$-time, $O(n)$-space method that iteratively refines rank arrays to recover the LF-mapping, and finally reconstructs a matching string in subquadratic time given LF. The work lays the groundwork for using pBWTs as compact indices for parameterized strings and discusses extensions to prev$_\infty$-encoding and future directions toward subquadratic inversion, with implications for p-matching data structures and reverse-engineering string indexes.

Abstract

The Burrows-Wheeler Transform (BWT) of a string is an invertible permutation of the string, which can be used for data compression and compact indexes for string pattern matching. Ganguly et al. [SODA, 2017] introduced the parameterized BWT (pBWT) to design compact indexes for parameterized matching (p-matching), a variant of string pattern matching with parameter symbols introduced by Baker [STOC, 1993]. Although the pBWT was inspired by the BWT, it is not obvious whether the pBWT itself is invertible or not. In this paper we show that we can retrieve the original string (up to renaming of parameter symbols) from the pBWT of length $n$ in $O(n^2)$ time and $O(n)$ space.

Inverting Parameterized Burrows-Wheeler Transform

TL;DR

This paper proves that the parameterized Burrows-Wheeler Transform (pBWT) is invertible: from the pBWT of a -string of length , one can recover up to renaming of parameter symbols in time using space. The authors formalize parameterized strings via s- and p-symbols, introduce prev-encoding to capture -matching, and define the pBWT together with the LF-mapping. They present a constructive inversion algorithm: a first -time, -space approach using prefix encodings, then an optimized -time, -space method that iteratively refines rank arrays to recover the LF-mapping, and finally reconstructs a matching string in subquadratic time given LF. The work lays the groundwork for using pBWTs as compact indices for parameterized strings and discusses extensions to prev-encoding and future directions toward subquadratic inversion, with implications for p-matching data structures and reverse-engineering string indexes.

Abstract

The Burrows-Wheeler Transform (BWT) of a string is an invertible permutation of the string, which can be used for data compression and compact indexes for string pattern matching. Ganguly et al. [SODA, 2017] introduced the parameterized BWT (pBWT) to design compact indexes for parameterized matching (p-matching), a variant of string pattern matching with parameter symbols introduced by Baker [STOC, 1993]. Although the pBWT was inspired by the BWT, it is not obvious whether the pBWT itself is invertible or not. In this paper we show that we can retrieve the original string (up to renaming of parameter symbols) from the pBWT of length in time and space.

Paper Structure

This paper contains 7 sections, 7 theorems, 4 equations, 3 figures, 1 table.

Key Result

Lemma 1

For a positive integer $U$, a string over an alphabet $[1..U]$ can be dynamically maintained while supporting insertion/deletion of a symbol to/from any position of the string as well as random access in $(m + o(m)) \lg U$ bits of space and $O(\frac{\lg m}{\lg \lg m})$ query and update times, where

Figures (3)

  • Figure 1: Illustrations for the cases of Proposition \ref{['prop:prepend']}.
  • Figure 2: Illustration for the computing process from $\langle\mathsf{T}_{\tau[i]}\rangle[1..2]$ to $\langle\mathsf{T}_{\tau[i]}\rangle[1..3]$ for our running example $\mathsf{T}=\mathtt{xyxzzxxyx\$}$ of Table \ref{['table:arrays']}. When $\langle\mathsf{T}_{\tau[i]}\rangle[1..2]$ is extended to the left according to Proposition \ref{['prop:prepend']}, the 0s in red are turned into the distance to the beginning position in $\langle\mathsf{T}_{\tau[i]-1}\rangle[1..3]$.
  • Figure 3: Illustration of $\mathit{G}_{2}[i]$, $\mathit{Z}_{2}[i]$, $\mathit{E}_{2}[i]$ and $\langle\mathsf{T}_{\tau[i]-1}\rangle[3]$ for our running example $\mathsf{T}=\mathtt{xyxzzxxyx\$}$ of Table \ref{['table:arrays']}. $\mathit{G}_{2}[i]$ points to the smallest position at which $\langle\mathsf{T}_{\tau[i]-1}\rangle[1..2]$ appears in the sorted $\mathit{Pref}_{2}$. Remark that we do not have $\langle\mathsf{T}_{\tau[i]}\rangle[1..2]$ and $\langle\mathsf{T}_{\tau[i]-1}\rangle[1..2]$ explicitly. Still we can compute $\langle\mathsf{T}_{\tau[i]-1}\rangle[3]$ from $\mathsf{L}[i]$, $\mathit{Z}_{2}[i]$ and $\mathit{E}_{2}[i]$. By sorting the pairs of the form $(\mathit{G}_{2}[i], \langle\mathsf{T}_{\tau[i]-1}\rangle[3])$, we can obtain the refined intervals with prefixes of length 3.

Theorems & Definitions (12)

  • Lemma 1: 2015MunroN_ComprDataStrucForDynam_ESA
  • Lemma 2
  • proof
  • Proposition 3
  • Theorem 4
  • proof
  • Lemma 5
  • proof
  • Theorem 6
  • proof
  • ...and 2 more