Merging RLBWTs adaptively

Travis Gagie

Merging RLBWTs adaptively

Travis Gagie

TL;DR

This work addresses merging two run-length compressed Burrows-Wheeler Transforms (RLBWTs) into a run-length compressed extended BWT (eBWT) efficiently. The authors introduce move-structures that support evaluating the inverse LF mapping ($\Psi$) and comparing contexts in $O(\log r)$ time, enabling a merge that runs in $O(r)$ space and $O((r + L) \log (m + n))$ time, where $m,n$ are uncompressed lengths, $r$ is the final eBWT's number of runs, and $L$ is the sum of irreducible LCP values. An initial analysis yields $O((r \log r + L) \log (m + n))$ time, which is subsequently improved via an optimization that reduces the $O(\log r)$ context-compare overhead to achieve the final bound. The approach lays groundwork for extending to merging more than two RLBWTs and emphasizes practical relevance for pangenomic references by enabling compressed, scalable merges.

Abstract

We show how to merge two run-length compressed Burrows-Wheeler Transforms (RLBWTs) into a run-length compressed extended Burrows-Wheeler Transform (eBWT) in $O (r)$ space and $O ((r + L) \log (m + n))$ time, where $m$ and $n$ are the lengths of the uncompressed strings, $r$ is the number of runs in the final eBWT and $L$ is the sum of its irreducible LCP values.

Merging RLBWTs adaptively

TL;DR

) and comparing contexts in

time, enabling a merge that runs in

space and

time, where

are uncompressed lengths,

is the final eBWT's number of runs, and

is the sum of irreducible LCP values. An initial analysis yields

time, which is subsequently improved via an optimization that reduces the

context-compare overhead to achieve the final bound. The approach lays groundwork for extending to merging more than two RLBWTs and emphasizes practical relevance for pangenomic references by enabling compressed, scalable merges.

Abstract

We show how to merge two run-length compressed Burrows-Wheeler Transforms (RLBWTs) into a run-length compressed extended Burrows-Wheeler Transform (eBWT) in

space and

time, where

and

are the lengths of the uncompressed strings,

is the number of runs in the final eBWT and

is the sum of its irreducible LCP values.

Merging RLBWTs adaptively

TL;DR

Abstract

Merging RLBWTs adaptively

TL;DR

Abstract

Paper Structure

Table of Contents

Key Result

Figures (1)

Theorems & Definitions (7)