Table of Contents
Fetching ...

Maximal number of subword occurrences in a word

Wenjie Fang

TL;DR

This work investigates the maximal density of subword occurrences in words by introducing subword entropy $S_{\mathrm{sw}}(w)=\log_2\operatorname{maxocc}(w)$ and studying the minimal entropy over words of length $n$ on an alphabet of size $k$. It proves the limit $L_k=\lim_{n\to\infty}\min S_{\mathrm{sw}}^{(k)}(n)/n$ exists and is finite, deriving foundational bounds and revealing a richer structure through the binary case. The authors derive sharper upper bounds for $L_2$ by analyzing periodic word families and computing explicit generating functions and saddle-point asymptotics for maximal subword occurrences, complemented by extensive experimental data. They also establish a general rationality result for $f_{w,v}(x,y)$, enabling a systematic analytic-combinatorics-based approach to asymptotics, and pose open questions and conjectures about the structure of minimizing words and the growth of $\min S_{\mathrm{sw}}^{(k)}(n)$. The findings advance understanding of pattern density in words and provide a framework for future precise characterizations of subword occurrence limits.

Abstract

We consider the number of occurrences of subwords (non-consecutive sub-sequences) in a given word. We first define the notion of subword entropy of a given word that measures the maximal number of occurrences among all possible subwords. We then give upper and lower bounds of minimal subword entropy for words of fixed length in a fixed alphabet, and also showing that minimal subword entropy per letter has a limit value. A better upper bound of minimal subword entropy for a binary alphabet is then given by looking at certain families of periodic words. We also give some conjectures based on experimental observations.

Maximal number of subword occurrences in a word

TL;DR

This work investigates the maximal density of subword occurrences in words by introducing subword entropy and studying the minimal entropy over words of length on an alphabet of size . It proves the limit exists and is finite, deriving foundational bounds and revealing a richer structure through the binary case. The authors derive sharper upper bounds for by analyzing periodic word families and computing explicit generating functions and saddle-point asymptotics for maximal subword occurrences, complemented by extensive experimental data. They also establish a general rationality result for , enabling a systematic analytic-combinatorics-based approach to asymptotics, and pose open questions and conjectures about the structure of minimizing words and the growth of . The findings advance understanding of pattern density in words and provide a framework for future precise characterizations of subword occurrence limits.

Abstract

We consider the number of occurrences of subwords (non-consecutive sub-sequences) in a given word. We first define the notion of subword entropy of a given word that measures the maximal number of occurrences among all possible subwords. We then give upper and lower bounds of minimal subword entropy for words of fixed length in a fixed alphabet, and also showing that minimal subword entropy per letter has a limit value. A better upper bound of minimal subword entropy for a binary alphabet is then given by looking at certain families of periodic words. We also give some conjectures based on experimental observations.
Paper Structure (10 sections, 22 theorems, 44 equations, 2 tables)

This paper contains 10 sections, 22 theorems, 44 equations, 2 tables.

Key Result

Lemma 3.1

For words $w, w', u, u'$, we have $\operatorname{occ}(w \cdot w', u \cdot u') \geq \operatorname{occ}(w, u) \operatorname{occ}(w', u')$.

Theorems & Definitions (47)

  • Lemma 3.1
  • proof
  • Lemma 3.2
  • proof
  • Proposition 3.3
  • proof
  • Proposition 3.4
  • proof
  • Corollary 3.5
  • proof
  • ...and 37 more