A Tighter Upper Bound for Distinct Squares

Eitatsu Tomita; Tomohiro I

A Tighter Upper Bound for Distinct Squares

Eitatsu Tomita, Tomohiro I

TL;DR

This work addresses bounding the number of distinct squares $|\mathit{SQ}_{w}|$ in a word $w$ of length $n$ over an alphabet of size $\sigma$. Building on Brlek and Li's Rauzy-graph framework, the authors refine the circuit assignment to Lyndon-root families and establish an additional circuit reservation, yielding the tighter bound $|\mathit{SQ}_{w}| \le n - \Theta(\log n)$. The result improves the long-standing $n-\sigma$ bound, notably when $\sigma \in o(\log n)$, via a two-case analysis that leverages circuit loading with per-circuit bounds $\text{avg}_{w}(z) \le \frac{|z|}{|z|+1}$ and a decomposition into long and short squares. The approach highlights the power of Rauzy graphs and Lyndon-root structure in understanding repetition density in words and suggests directions for extending these techniques to distinct $k$-powers and circular word settings.

Abstract

A square is a word of the form $xx$ for a non-empty word $x$. Brlek and Li [Comb. Theory, 2025] proved that the number of distinct squares in a word of length $n$ that uses $σ$ distinct letters is at most $n - σ$. In this paper, we give a new upper bound $n - Θ(\log n)$, improving the previous bound when $σ\in o(\log n)$.

A Tighter Upper Bound for Distinct Squares

TL;DR

This work addresses bounding the number of distinct squares

in a word

of length

over an alphabet of size

. Building on Brlek and Li's Rauzy-graph framework, the authors refine the circuit assignment to Lyndon-root families and establish an additional circuit reservation, yielding the tighter bound

. The result improves the long-standing

bound, notably when

, via a two-case analysis that leverages circuit loading with per-circuit bounds

and a decomposition into long and short squares. The approach highlights the power of Rauzy graphs and Lyndon-root structure in understanding repetition density in words and suggests directions for extending these techniques to distinct

-powers and circular word settings.

Abstract

A square is a word of the form

for a non-empty word

. Brlek and Li [Comb. Theory, 2025] proved that the number of distinct squares in a word of length

that uses

distinct letters is at most

. In this paper, we give a new upper bound

, improving the previous bound when

Paper Structure (4 sections, 13 theorems, 6 equations, 2 figures)

This paper contains 4 sections, 13 theorems, 6 equations, 2 figures.

Introduction
Preliminaries
A new upper bound $|w| - \Theta(\log |w|)$ of $|\mathit{SQ}_{w}|$
Conclusion and future work

Key Result

Theorem 1

For any word $w$ of length $n$, the number of distinct squares in $w$ is upper bounded by $n - \Theta(\log n)$.

Figures (2)

Figure 1: Illustration for the Rauzy graph $\Gamma_{w}(\ell)$ of all orders $\ell \in [0..8]$ for a word $w = \texttt{aabaabaa}$ of length $8$. Each dashed arc is the smallest arc in a circuit in $\mathit{CS}_{w}(z)$ for some $z \in \mathit{Lyn}_{w}$ (defined in \ref{['def:cs']}), where $\mathit{CS}_{w}(\mathtt{a}) = \{ (\mathtt{a}), (\mathtt{aa}) \}$, $\mathit{CS}_{w}(\mathtt{b}) = \{ (\mathtt{b}) \}$, $\mathit{CS}_{w}(\mathtt{ab}) = \{ (\mathtt{ab}, \mathtt{ba}) \}$ and $\mathit{CS}_{w}(\mathtt{aab}) = \{ (\mathtt{aab}, \mathtt{aba}, \mathtt{baa}), (\mathtt{aaba}, \mathtt{abaa}, \mathtt{baab}), (\mathtt{aabaa}, \mathtt{abaab}, \mathtt{baaba}), (\mathtt{aabaab}, \mathtt{abaaba}, \mathtt{baabaa})\}$. In \ref{['sec:proof']}, we will assign $\mathit{SQ}_{w}(\mathtt{a}) = \{ \mathtt{aa} \}$ to $\mathit{CS}_{w}(\mathtt{a})$ and $\mathit{SQ}_{w}(\mathtt{aab}) = \{ \mathtt{aabaab}, \mathtt{abaaba}, \mathtt{baabaa} \}$ to $\mathit{CS}_{w}(\mathtt{aab})$.
Figure 2: Illustration for the proof of \ref{['lemma:sqs_cs']}. Here consider the situation where $|z| = 5$, $r = 3$, $s = 3$, $k_1 = 2$, $k_2 = 3$ and $k_3 = 5$. Since $k_{2} - k_{1} = 1, k_{3} - k_{2} = 2, k_{4} - k_{3} = k_{1} + |z| - k_{3} = 2$, we have $g = 2$. $\mathit{SQ}_{w}(z)$ contains $s$ squares of the form $x_{k_j}^{2r}$ for $j \in [1..s]$, which are aligned under $z^{\infty}$. We see that every factor of length $2|z|r - g + 1$ of $z^\infty$ (depicted with a red double-headed arrow) is guaranteed to be covered by $x_{k_j}^{2r}$ for some $j = [1..s]$, and thus a factor of $w$.

Theorems & Definitions (22)

Theorem 1
Example 2
Definition 3: 1954Lyndon_BurnsSProbl1997Lothaire_CombinOnWords
Lemma 4
Remark 5
Definition 6
Definition 7
Theorem 8: Theorem 2, Chapter 4 in Berge1982TheorOfGraphAndIts
Definition 9: Rauzy graph 1982Rauzy_SuitesATermesDansUn
Lemma 10
...and 12 more

A Tighter Upper Bound for Distinct Squares

TL;DR

Abstract

A Tighter Upper Bound for Distinct Squares

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (2)

Theorems & Definitions (22)