Multiplicative Turing Ensembles, Pareto's Law, and Creativity
Alexander Kolpakov, Aidan Rocke
TL;DR
The paper introduces the Multiplicative Turing Ensemble (MTE), a prime-multiplier Markov process grounded in probabilistic Turing machines and encoded via Elias' $\omega$ codelength. By applying a maximum-entropy principle with energy $E(n)=\ell_\omega(n)$, it derives a canonical Gibbs prior on primes; a scaled version with $\beta>1$ yields finite moments and Pareto tails for additive gaps. It proves almost-sure convergence of time-averaged codelengths along MTE trajectories despite the chain being transient, and demonstrates that real-world code-size distributions (Debian and PyPI) are better captured by a scaled-$\omega$ prior than by the pure omega prior or a uniform baseline. The work connects algorithmic information theory, tail behavior, and empirical distributions of complexity, offering a framework to distinguish machine-driven versus human-driven complexity in practical datasets. Overall, the approach enables principled modeling of multiplicative integer dynamics with implications for complexity, coding, and the study of Pareto-like phenomena in computational contexts.
Abstract
We study integer-valued multiplicative dynamics driven by i.i.d. prime multipliers and connect their macroscopic statistics to universal codelengths. We introduce the Multiplicative Turing Ensemble (MTE) and show how it arises naturally - though not uniquely - from ensembles of probabilistic Turing machines. Our modeling principle is variational: taking Elias' Omega codelength as an energy and imposing maximum entropy constraints yields a canonical Gibbs prior on integers and, by restriction, on primes. Under mild tail assumptions, this prior induces exponential tails for log-multipliers (up to slowly varying corrections), which in turn generate Pareto tails for additive gaps. We also prove time-average laws for the Omega codelength along MTE trajectories. Empirically, on Debian and PyPI package size datasets, a scaled Omega prior achieves the lowest KL divergence against codelength histograms. Taken together, the theory-data comparison suggests a qualitative split: machine-adapted regimes (Gibbs-aligned, finite first moment) exhibit clean averaging behavior, whereas human-generated complexity appears to sit beyond this regime, with tails heavy enough to produce an unbounded first moment, and therefore no averaging of the same kind.
