Efficient Computation of Periods and Covers Using Sampling
Thierry Lecroq, Francesco Pio Marino
TL;DR
This paper introduces a novel application of Characters-Distance-Sampling (CDS) to compute fundamental string regularities, specifically the period and the shortest cover. It develops CDS-based algorithms that operate directly on the CDS representation with a single pivot (the first character), preserving linear-time behavior while enabling substantial speedups over classical methods. Empirically, the CDS-based approaches achieve speedups in the ranges $38\%$--$43\%$ for period computation and $63\%$--$72\%$ for cover detection, highlighting the practical efficiency and potential of CDS-based string analysis for applications in compression, computational biology, and pattern recognition. The results suggest broader applicability of CDS representations for efficient regularity detection in strings.
Abstract
Identifying regularities in strings, such as \emph{periods} and \emph{covers}, is crucial for applications in text compression, computational biology, and pattern recognition. \emph{Characters-Distance-Sampling} (\texttt{CDS}) is an efficient technique that encodes a string by storing distances between selected pivot characters, accelerating string-processing tasks. We apply \texttt{CDS} to compute periods and shortest covers, selecting only the first character as the pivot. This strategy yields optimized computations, achieving speedups of $38\%$--$43\%$ for period computation and $63\%$--$72\%$ for cover detection. These results demonstrate the potential of \texttt{CDS}-based representations for efficient string analysis and broader applications.
