Multiperiodic Processes: Ergodic Sources with a Sublinear Entropy
Łukasz Dębowski
TL;DR
Multiperiodic processes provide a rigorously tractable, ergodic but non-mixing toy model that achieves Hilberg's law with vanishing entropy rate by embedding Zipf-like type frequencies through randomly shifted deterministic sequences. The Infinite Clock algorithm generates these multiperiodic sequences, and the authors develop a suite of statistics (relative frequencies, waiting times, number of observed types) and information-theoretic bounds (seed estimation, block entropy) to characterize the model. They illustrate two regimes—constant and linear periods—showing how period growth controls type-token growth and entropy properties, and under moment conditions, can realize Hilberg-type power laws with tunable exponents. The work connects to broader themes in linguistic statistics and neural scaling, offering a transparent framework that aligns Zipf's law with long-range dependencies observed in language data.
Abstract
We construct multiperiodic processes -- a simple example of stationary ergodic (but not mixing) processes over natural numbers that enjoy the vanishing entropy rate under a mild condition. Multiperiodic processes are supported on randomly shifted deterministic sequences called multiperiodic sequences, which can be efficiently generated using an algorithm called the Infinite Clock. Under a suitable parameterization, multiperiodic sequences exhibit relative frequencies of particular numbers given by Zipf's law. Exactly in the same setting, the respective multiperiodic processes satisfy an asymptotic power-law growth of block entropy, called Hilberg's law. Hilberg's law is deemed to hold for statistical language models, in particular.
