Recovery of cyclic words by their subwords
Sergey Luchinin, Svetlana Puzynina, Michaël Rao
TL;DR
The paper addresses reconstructing cyclic binary words from their scattered subwords and proves a near-optimal upper bound: any two distinct cyclic binary words of length $n$ possess a distinguishing subword of length at most $\frac{3}{4}n+4$, while there exist pairs with identical subwords up to $\frac{3}{4}n-\frac{3}{2}$. The authors develop a detailed combinatorial framework, classifying words into special, periodic, and other forms, and employ tools such as unioccurrent subwords, turns, and 1-overlays to construct distinguishing subwords. The main contribution is a near-sharp bound on the minimal subword length needed for unique reconstruction, supported by a structured proof with multiple case analyses and lower-bound constructions, highlighting open questions about exact values and multiplicities. This advances understanding of information requirements for identifying cyclic words from partial subword information and informs related string reconstruction problems.
Abstract
A problem of reconstructing words from their subwords involves determining the minimum amount of information needed, such as multisets of scattered subwords of a specific length or the frequency of scattered subwords from a given set, in order to uniquely identify a word. In this paper we show that a cyclic word on a binary alphabet can be reconstructed by its scattered subwords of length $\frac34n+4$, and for each $n$ one can find two cyclic words of length $n$ which have the same set of scattered subwords of length $\frac34n-\frac32$.
