Minimizers in Semi-Dynamic Strings

Wiktor Zuba; Oded Lachish; Solon P. Pissis

Minimizers in Semi-Dynamic Strings

Wiktor Zuba, Oded Lachish, Solon P. Pissis

TL;DR

This work addresses efficient computation of minimizers under semi-dynamic string updates, a common need in sequence analysis where the window slides or strings are extended at ends. It introduces a semi-dynamic string model with border modifications and a minimizer data structure that supports $\mathcal{O}(1)$ minimizer queries and amortized $\mathcal{O}(1)$ updates; it also presents a space-efficient variant achieving $\mathcal{O}(\sqrt{w})$ working space with the same asymptotic time, enabling $\mathcal{O}(n)$ computation of $\mathcal{M}_{w,k,\rho}(S)$ in sublinear space. The paper further develops a two-layer (and general multi-layer) framework to bound stored information and rebuild behavior, providing theoretical guarantees and practical performance. An application to minimizers on a weighted trie demonstrates the approach's usefulness in reducing space and time in realistic genomic settings, with experimental results favoring the proposed structures over traditional $\mathcal{O}(w)$-space sliding-window methods.

Abstract

Minimizers sampling is one of the most widely-used mechanisms for sampling strings. Let $S=S[0]\ldots S[n-1]$ be a string over an alphabet $Σ$. In addition, let $w\geq 2$ and $k\geq 1$ be two integers and $ρ=(Σ^k,\leq)$ be a total order on $Σ^k$. The minimizer of window $X=S[i\mathinner{.\,.} i+w+k-2]$ is the smallest position in $[i,i+w-1]$ where the smallest length-$k$ substring of $S[i\mathinner{.\,.} i+w+k-2]$ based on $ρ$ starts. The set of minimizers for all $i\in[0,n-w-k+1]$ is the set $\mathcal{M}_{w,k,ρ}(S)$ of the minimizers of $S$. The set $\mathcal{M}_{w,k,ρ}(S)$ can be computed in $\mathcal{O}(n)$ time. The folklore algorithm for this computation computes the minimizer of every window in $\mathcal{O}(1)$ amortized time using $\mathcal{O}(w)$ working space. It is thus natural to pose the following two questions: Question 1: Can we efficiently support other dynamic updates on the window? Question 2: Can we improve on the $\mathcal{O}(w)$ working space? We answer both questions in the affirmative: 1. We term a string $X$ semi-dynamic when one is allowed to insert or delete a letter at any of its ends. We show a data structure that maintains a semi-dynamic string $X$ and supports minimizer queries in $X$ in $\mathcal{O}(1)$ time with amortized $\mathcal{O}(1)$ time per update operation. 2. We show that this data structure can be modified to occupy strongly sublinear space without increasing the asymptotic complexity of its operations. To the best of our knowledge, this yields the first algorithm for computing $\mathcal{M}_{w,k,ρ}(S)$ in $\mathcal{O}(n)$ time using $\mathcal{O}(\sqrt{w})$ working space. We complement our theoretical results with a concrete application and an experimental evaluation.

Minimizers in Semi-Dynamic Strings

TL;DR

minimizer queries and amortized

updates; it also presents a space-efficient variant achieving

working space with the same asymptotic time, enabling

computation of

in sublinear space. The paper further develops a two-layer (and general multi-layer) framework to bound stored information and rebuild behavior, providing theoretical guarantees and practical performance. An application to minimizers on a weighted trie demonstrates the approach's usefulness in reducing space and time in realistic genomic settings, with experimental results favoring the proposed structures over traditional

-space sliding-window methods.

Abstract

Minimizers sampling is one of the most widely-used mechanisms for sampling strings. Let

be a string over an alphabet

. In addition, let

and

be two integers and

be a total order on

. The minimizer of window

is the smallest position in

where the smallest length-

substring of

based on

starts. The set of minimizers for all

is the set

of the minimizers of

. The set

can be computed in

time. The folklore algorithm for this computation computes the minimizer of every window in

amortized time using

working space. It is thus natural to pose the following two questions: Question 1: Can we efficiently support other dynamic updates on the window? Question 2: Can we improve on the

working space? We answer both questions in the affirmative: 1. We term a string

semi-dynamic when one is allowed to insert or delete a letter at any of its ends. We show a data structure that maintains a semi-dynamic string

and supports minimizer queries in

time with amortized

time per update operation. 2. We show that this data structure can be modified to occupy strongly sublinear space without increasing the asymptotic complexity of its operations. To the best of our knowledge, this yields the first algorithm for computing

time using

working space. We complement our theoretical results with a concrete application and an experimental evaluation.

Minimizers in Semi-Dynamic Strings

TL;DR

Abstract

Minimizers in Semi-Dynamic Strings

TL;DR

Abstract

Paper Structure

Table of Contents

Key Result

Figures (7)

Theorems & Definitions (8)