Universal parameterized family of distributions of runs
Hayato Takahashi
TL;DR
This work develops explicit, parameterized probability distributions for runs and nonoverlapping words in i.i.d. finite-valued sequences, including μ-overlapping generalizations for binary data. It introduces a universal framework based on increasing nonoverlapping words and generating functions to produce exact distribution formulas and moments for these statistics. The authors propose a linear-time algorithm for fixed parameter counts and provide asymptotic partition bounds via Meinardus's theorem to handle large parameter families. They also analyze convergence of distributions as the word set grows, demonstrating Mood's exact run probabilities as a special case and illustrating practical computational feasibility and theoretical depth.
Abstract
We present explicit formulae for parameterized families of probabilities of the number of nonoverlapping words and increasing nonoverlapping words in independent and identically distributed (i.i.d.) finite valued random variables, respectively. Then we provide an explicit formula for a parameterized family of probabilities of the number of runs, which generalizes \(μ\)-overlapping probabilities for \(μ\geq 0\) in i.i.d.~binary valued random variables. We also demonstrate exact probabilities of the number of runs whose size are exactly given numbers (Mood 1940). The number of arithmetic operations required to compute our formula for generalized probabilities of runs is linear order of sample size for fixed number of parameters and range. To analyse these number of arithmetic operations for unbounded number of parameters, we show an asymptotic formula for the number of integer partitions that are less than or equal to given number as a special case of Meinardus's theorem.
