Table of Contents
Fetching ...

Merging RLBWTs adaptively

Travis Gagie

TL;DR

This work addresses merging two run-length compressed Burrows-Wheeler Transforms (RLBWTs) into a run-length compressed extended BWT (eBWT) efficiently. The authors introduce move-structures that support evaluating the inverse LF mapping ($\Psi$) and comparing contexts in $O(\log r)$ time, enabling a merge that runs in $O(r)$ space and $O((r + L) \log (m + n))$ time, where $m,n$ are uncompressed lengths, $r$ is the final eBWT's number of runs, and $L$ is the sum of irreducible LCP values. An initial analysis yields $O((r \log r + L) \log (m + n))$ time, which is subsequently improved via an optimization that reduces the $O(\log r)$ context-compare overhead to achieve the final bound. The approach lays groundwork for extending to merging more than two RLBWTs and emphasizes practical relevance for pangenomic references by enabling compressed, scalable merges.

Abstract

We show how to merge two run-length compressed Burrows-Wheeler Transforms (RLBWTs) into a run-length compressed extended Burrows-Wheeler Transform (eBWT) in $O (r)$ space and $O ((r + L) \log (m + n))$ time, where $m$ and $n$ are the lengths of the uncompressed strings, $r$ is the number of runs in the final eBWT and $L$ is the sum of its irreducible LCP values.

Merging RLBWTs adaptively

TL;DR

This work addresses merging two run-length compressed Burrows-Wheeler Transforms (RLBWTs) into a run-length compressed extended BWT (eBWT) efficiently. The authors introduce move-structures that support evaluating the inverse LF mapping () and comparing contexts in time, enabling a merge that runs in space and time, where are uncompressed lengths, is the final eBWT's number of runs, and is the sum of irreducible LCP values. An initial analysis yields time, which is subsequently improved via an optimization that reduces the context-compare overhead to achieve the final bound. The approach lays groundwork for extending to merging more than two RLBWTs and emphasizes practical relevance for pangenomic references by enabling compressed, scalable merges.

Abstract

We show how to merge two run-length compressed Burrows-Wheeler Transforms (RLBWTs) into a run-length compressed extended Burrows-Wheeler Transform (eBWT) in space and time, where and are the lengths of the uncompressed strings, is the number of runs in the final eBWT and is the sum of its irreducible LCP values.

Paper Structure

This paper contains 7 sections, 7 theorems, 14 equations, 1 figure.

Key Result

Lemma 1

Given the RLBWTs $\mathrm{BWT}_S$ and $\mathrm{BWT}_T$, in $O (r \log r)$ time we can build $O (r)$-space data structures for iteratively evaluating the $\Psi$ functions for $S$ and $T$ in $O (\log r)$ time plus constant time per iteration.

Figures (1)

  • Figure 1: Pseudocode for merging $\mathrm{BWT}_S [1..m]$ and $\mathrm{BWT}_T [1..n]$ into $\mathrm{BWT}_{S, T} [1..m + n]$.

Theorems & Definitions (7)

  • Lemma 1
  • Lemma 2
  • Lemma 3
  • Theorem 4
  • Lemma 5
  • Theorem 6
  • Lemma 6