Efficient Computation of Periods and Covers Using Sampling

Thierry Lecroq; Francesco Pio Marino

Efficient Computation of Periods and Covers Using Sampling

Thierry Lecroq, Francesco Pio Marino

TL;DR

This paper introduces a novel application of Characters-Distance-Sampling (CDS) to compute fundamental string regularities, specifically the period and the shortest cover. It develops CDS-based algorithms that operate directly on the CDS representation with a single pivot (the first character), preserving linear-time behavior while enabling substantial speedups over classical methods. Empirically, the CDS-based approaches achieve speedups in the ranges $38\%$--$43\%$ for period computation and $63\%$--$72\%$ for cover detection, highlighting the practical efficiency and potential of CDS-based string analysis for applications in compression, computational biology, and pattern recognition. The results suggest broader applicability of CDS representations for efficient regularity detection in strings.

Abstract

Identifying regularities in strings, such as \emph{periods} and \emph{covers}, is crucial for applications in text compression, computational biology, and pattern recognition. \emph{Characters-Distance-Sampling} (\texttt{CDS}) is an efficient technique that encodes a string by storing distances between selected pivot characters, accelerating string-processing tasks. We apply \texttt{CDS} to compute periods and shortest covers, selecting only the first character as the pivot. This strategy yields optimized computations, achieving speedups of $38\%$--$43\%$ for period computation and $63\%$--$72\%$ for cover detection. These results demonstrate the potential of \texttt{CDS}-based representations for efficient string analysis and broader applications.

Efficient Computation of Periods and Covers Using Sampling

TL;DR

for period computation and

for cover detection, highlighting the practical efficiency and potential of CDS-based string analysis for applications in compression, computational biology, and pattern recognition. The results suggest broader applicability of CDS representations for efficient regularity detection in strings.

Abstract

for period computation and

for cover detection. These results demonstrate the potential of \texttt{CDS}-based representations for efficient string analysis and broader applications.

Paper Structure (6 sections, 3 theorems, 10 equations, 3 figures)

This paper contains 6 sections, 3 theorems, 10 equations, 3 figures.

Introduction
Preliminaries
Characters-Distance-Sampling in Brief
Classical Computation of the Period of a String
Classical Computation of the Shortest Cover of a String
Computing the period of a string from its CDS representation

Key Result

lemma thmcounterlemma

Let $x[\delta(i)] = a$ for $0 \leq i \leq \bar{m}-1$. Then:

Figures (3)

Figure 1: Border array of $x=\hbox{\tt abaababaaba}$ of length $11$.
Figure 2: Border array of the CDS representation of $x=\hbox{\tt abaababaaba}$ of length $11$ with pivot a. Then $\textit{per}(\bar{x}) = 6-\textit{border}_{\bar{x}}[6]=3$, thus $\textit{per}(x) = \bar{x}[0]+\bar{x}[1]+\bar{x}[2]=2+1+2=5$.
Figure 3: Border array of the CDS representation of $x=\hbox{\tt abbababbabb}=\hbox{\tt abbababbab}^2$ of length $11$ with pivot a thus $k = 2$. $\textit{border}_{\bar{x}}[3]=1$ but $\bar{x}[\textit{border}_{\bar{x}}[3]=1]=2 \leq k=2$. $\textit{border}_{\bar{x}}[1]=0$ and $\bar{x}[\textit{border}_{\bar{x}}[1]=0]=3 > k=2$. Since we use the border of $\bar{x}$ of length $\textit{border}_{\bar{x}}[1]=0$, we use the period of $\bar{x}$ which is equal to $|\bar{x}|-\textit{border}_{\bar{x}}[1]=3$, then we sum the first $3$ elements of $\bar{x}$ to get $\textit{per}(x) = \bar{x}[0]+\bar{x}[1]+\bar{x}[2]=3+2+3=8$.

Theorems & Definitions (6)

lemma thmcounterlemma
proof
lemma thmcounterlemma
proof
lemma thmcounterlemma
proof

Efficient Computation of Periods and Covers Using Sampling

TL;DR

Abstract

Efficient Computation of Periods and Covers Using Sampling

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (3)

Theorems & Definitions (6)