Table of Contents
Fetching ...

Space-time Trade-offs for the LCP Array of Wheeler DFAs

Nicola Cotumaccio, Travis Gagie, Dominik Köppl, Nicola Prezza

TL;DR

A sampling technique is proposed that allows to access an entry of the LCP array in logarithmic time by only storing a linear number of bits and is used to provide a space-time tradeoff to compute matching statistics on a Wheeler DFA.

Abstract

Recently, Conte et al. generalized the longest-common prefix (LCP) array from strings to Wheeler DFAs, and they showed that it can be used to efficiently determine matching statistics on a Wheeler DFA [DCC 2023]. However, storing the LCP array requires $ O(n \log n) $ bits, $ n $ being the number of states, while the compact representation of Wheeler DFAs often requires much less space. In particular, the BOSS representation of a de Bruijn graph only requires a linear number of bits, if the size of alphabet is constant. In this paper, we propose a sampling technique that allows to access an entry of the LCP array in logarithmic time by only storing a linear number of bits. We use our technique to provide a space-time trade-off to compute matching statistics on a Wheeler DFA. In addition, we show that by augmenting the BOSS representation of a $ k $-th order de Bruijn graph with a linear number of bits we can navigate the underlying variable-order de Bruijn graph in time logarithmic in $ k $, thus improving a previous bound by Boucher et al. which was linear in $ k $ [DCC 2015].

Space-time Trade-offs for the LCP Array of Wheeler DFAs

TL;DR

A sampling technique is proposed that allows to access an entry of the LCP array in logarithmic time by only storing a linear number of bits and is used to provide a space-time tradeoff to compute matching statistics on a Wheeler DFA.

Abstract

Recently, Conte et al. generalized the longest-common prefix (LCP) array from strings to Wheeler DFAs, and they showed that it can be used to efficiently determine matching statistics on a Wheeler DFA [DCC 2023]. However, storing the LCP array requires bits, being the number of states, while the compact representation of Wheeler DFAs often requires much less space. In particular, the BOSS representation of a de Bruijn graph only requires a linear number of bits, if the size of alphabet is constant. In this paper, we propose a sampling technique that allows to access an entry of the LCP array in logarithmic time by only storing a linear number of bits. We use our technique to provide a space-time trade-off to compute matching statistics on a Wheeler DFA. In addition, we show that by augmenting the BOSS representation of a -th order de Bruijn graph with a linear number of bits we can navigate the underlying variable-order de Bruijn graph in time logarithmic in , thus improving a previous bound by Boucher et al. which was linear in [DCC 2015].
Paper Structure (5 sections, 5 theorems, 3 equations, 2 figures, 2 algorithms)

This paper contains 5 sections, 5 theorems, 3 equations, 2 figures, 2 algorithms.

Key Result

theorem 1

We can augment the compact representation of a Wheeler DFA $\mathcal{A}$ with $O(n)$ bits ($O(n \log \log \sigma)$ bits, respectively), where $n$ is the number of states and $\sigma$ is the size of the alphabet, in such a way that we can compute each entry of the LCP array of $\mathcal{A}$ in $O(\lo

Figures (2)

  • Figure 1: (a) A Wheeler DFA. States are numbered according to the Wheeler order. (b) The array $\mathsf{LCP}_\mathcal{A}$, and the values needed to compute $G = (V, H)$. We assume that a range minimum query returns the largest position of a minimum value. (c) The graph $G = (V, H)$, with $V(\lceil\log n\rceil) = V(4) = \{v_{24}, v_{32} \}$ (yellow states). (d) The data structures that we store.
  • Figure 2: The $3$-rd order de Bruijn graph for the set $\mathcal{S} = \{CGAC, GACG, GACT, TACG, GTCG, ACGA, ACGT, TCGA, CGTC \}$ from boucher2015. We proceed like in Figure \ref{['fig:examplewheeler']} (now we only consider odd entries of $\mathsf{LCP}_G$, and $h = \lceil\log k\rceil = 2$).

Theorems & Definitions (8)

  • theorem 1
  • theorem 2
  • theorem 3
  • theorem 4
  • theorem 5
  • definition 1
  • definition 2
  • definition 3