A Tighter Upper Bound for Distinct Squares
Eitatsu Tomita, Tomohiro I
TL;DR
This work addresses bounding the number of distinct squares $|\mathit{SQ}_{w}|$ in a word $w$ of length $n$ over an alphabet of size $\sigma$. Building on Brlek and Li's Rauzy-graph framework, the authors refine the circuit assignment to Lyndon-root families and establish an additional circuit reservation, yielding the tighter bound $|\mathit{SQ}_{w}| \le n - \Theta(\log n)$. The result improves the long-standing $n-\sigma$ bound, notably when $\sigma \in o(\log n)$, via a two-case analysis that leverages circuit loading with per-circuit bounds $\text{avg}_{w}(z) \le \frac{|z|}{|z|+1}$ and a decomposition into long and short squares. The approach highlights the power of Rauzy graphs and Lyndon-root structure in understanding repetition density in words and suggests directions for extending these techniques to distinct $k$-powers and circular word settings.
Abstract
A square is a word of the form $xx$ for a non-empty word $x$. Brlek and Li [Comb. Theory, 2025] proved that the number of distinct squares in a word of length $n$ that uses $σ$ distinct letters is at most $n - σ$. In this paper, we give a new upper bound $n - Θ(\log n)$, improving the previous bound when $σ\in o(\log n)$.
