Table of Contents
Fetching ...

A Tighter Upper Bound for Distinct Squares

Eitatsu Tomita, Tomohiro I

TL;DR

This work addresses bounding the number of distinct squares $|\mathit{SQ}_{w}|$ in a word $w$ of length $n$ over an alphabet of size $\sigma$. Building on Brlek and Li's Rauzy-graph framework, the authors refine the circuit assignment to Lyndon-root families and establish an additional circuit reservation, yielding the tighter bound $|\mathit{SQ}_{w}| \le n - \Theta(\log n)$. The result improves the long-standing $n-\sigma$ bound, notably when $\sigma \in o(\log n)$, via a two-case analysis that leverages circuit loading with per-circuit bounds $\text{avg}_{w}(z) \le \frac{|z|}{|z|+1}$ and a decomposition into long and short squares. The approach highlights the power of Rauzy graphs and Lyndon-root structure in understanding repetition density in words and suggests directions for extending these techniques to distinct $k$-powers and circular word settings.

Abstract

A square is a word of the form $xx$ for a non-empty word $x$. Brlek and Li [Comb. Theory, 2025] proved that the number of distinct squares in a word of length $n$ that uses $σ$ distinct letters is at most $n - σ$. In this paper, we give a new upper bound $n - Θ(\log n)$, improving the previous bound when $σ\in o(\log n)$.

A Tighter Upper Bound for Distinct Squares

TL;DR

This work addresses bounding the number of distinct squares in a word of length over an alphabet of size . Building on Brlek and Li's Rauzy-graph framework, the authors refine the circuit assignment to Lyndon-root families and establish an additional circuit reservation, yielding the tighter bound . The result improves the long-standing bound, notably when , via a two-case analysis that leverages circuit loading with per-circuit bounds and a decomposition into long and short squares. The approach highlights the power of Rauzy graphs and Lyndon-root structure in understanding repetition density in words and suggests directions for extending these techniques to distinct -powers and circular word settings.

Abstract

A square is a word of the form for a non-empty word . Brlek and Li [Comb. Theory, 2025] proved that the number of distinct squares in a word of length that uses distinct letters is at most . In this paper, we give a new upper bound , improving the previous bound when .
Paper Structure (4 sections, 13 theorems, 6 equations, 2 figures)

This paper contains 4 sections, 13 theorems, 6 equations, 2 figures.

Key Result

Theorem 1

For any word $w$ of length $n$, the number of distinct squares in $w$ is upper bounded by $n - \Theta(\log n)$.

Figures (2)

  • Figure 1: Illustration for the Rauzy graph $\Gamma_{w}(\ell)$ of all orders $\ell \in [0..8]$ for a word $w = \texttt{aabaabaa}$ of length $8$. Each dashed arc is the smallest arc in a circuit in $\mathit{CS}_{w}(z)$ for some $z \in \mathit{Lyn}_{w}$ (defined in \ref{['def:cs']}), where $\mathit{CS}_{w}(\mathtt{a}) = \{ (\mathtt{a}), (\mathtt{aa}) \}$, $\mathit{CS}_{w}(\mathtt{b}) = \{ (\mathtt{b}) \}$, $\mathit{CS}_{w}(\mathtt{ab}) = \{ (\mathtt{ab}, \mathtt{ba}) \}$ and $\mathit{CS}_{w}(\mathtt{aab}) = \{ (\mathtt{aab}, \mathtt{aba}, \mathtt{baa}), (\mathtt{aaba}, \mathtt{abaa}, \mathtt{baab}), (\mathtt{aabaa}, \mathtt{abaab}, \mathtt{baaba}), (\mathtt{aabaab}, \mathtt{abaaba}, \mathtt{baabaa})\}$. In \ref{['sec:proof']}, we will assign $\mathit{SQ}_{w}(\mathtt{a}) = \{ \mathtt{aa} \}$ to $\mathit{CS}_{w}(\mathtt{a})$ and $\mathit{SQ}_{w}(\mathtt{aab}) = \{ \mathtt{aabaab}, \mathtt{abaaba}, \mathtt{baabaa} \}$ to $\mathit{CS}_{w}(\mathtt{aab})$.
  • Figure 2: Illustration for the proof of \ref{['lemma:sqs_cs']}. Here consider the situation where $|z| = 5$, $r = 3$, $s = 3$, $k_1 = 2$, $k_2 = 3$ and $k_3 = 5$. Since $k_{2} - k_{1} = 1, k_{3} - k_{2} = 2, k_{4} - k_{3} = k_{1} + |z| - k_{3} = 2$, we have $g = 2$. $\mathit{SQ}_{w}(z)$ contains $s$ squares of the form $x_{k_j}^{2r}$ for $j \in [1..s]$, which are aligned under $z^{\infty}$. We see that every factor of length $2|z|r - g + 1$ of $z^\infty$ (depicted with a red double-headed arrow) is guaranteed to be covered by $x_{k_j}^{2r}$ for some $j = [1..s]$, and thus a factor of $w$.

Theorems & Definitions (22)

  • Theorem 1
  • Example 2
  • Definition 3: 1954Lyndon_BurnsSProbl1997Lothaire_CombinOnWords
  • Lemma 4
  • Remark 5
  • Definition 6
  • Definition 7
  • Theorem 8: Theorem 2, Chapter 4 in Berge1982TheorOfGraphAndIts
  • Definition 9: Rauzy graph 1982Rauzy_SuitesATermesDansUn
  • Lemma 10
  • ...and 12 more