Merging RLBWTs adaptively
Travis Gagie
TL;DR
This work addresses merging two run-length compressed Burrows-Wheeler Transforms (RLBWTs) into a run-length compressed extended BWT (eBWT) efficiently. The authors introduce move-structures that support evaluating the inverse LF mapping ($\Psi$) and comparing contexts in $O(\log r)$ time, enabling a merge that runs in $O(r)$ space and $O((r + L) \log (m + n))$ time, where $m,n$ are uncompressed lengths, $r$ is the final eBWT's number of runs, and $L$ is the sum of irreducible LCP values. An initial analysis yields $O((r \log r + L) \log (m + n))$ time, which is subsequently improved via an optimization that reduces the $O(\log r)$ context-compare overhead to achieve the final bound. The approach lays groundwork for extending to merging more than two RLBWTs and emphasizes practical relevance for pangenomic references by enabling compressed, scalable merges.
Abstract
We show how to merge two run-length compressed Burrows-Wheeler Transforms (RLBWTs) into a run-length compressed extended Burrows-Wheeler Transform (eBWT) in $O (r)$ space and $O ((r + L) \log (m + n))$ time, where $m$ and $n$ are the lengths of the uncompressed strings, $r$ is the number of runs in the final eBWT and $L$ is the sum of its irreducible LCP values.
