Bayesian Modeling of Collatz Stopping Times: A Probabilistic Machine Learning Perspective

Nicolò Bonacorsi; Matteo Bordoni

Bayesian Modeling of Collatz Stopping Times: A Probabilistic Machine Learning Perspective

Nicolò Bonacorsi, Matteo Bordoni

TL;DR

Conditioning the block-length distribution on $m\bmod 8$ markedly improves the generator's distributional fit, indicating that low-order modular structure is a key driver of heterogeneity in $\tau(n)$.

Abstract

We study the Collatz total stopping time $τ(n)$ over $n\le 10^7$ from a probabilistic machine learning viewpoint. Empirically, $τ(n)$ is a skewed and heavily overdispersed count with pronounced arithmetic heterogeneity. We develop two complementary models. First, a Bayesian hierarchical Negative Binomial regression (NB2-GLM) predicts $τ(n)$ from simple covariates ($\log n$ and residue class $n \bmod 8$), quantifying uncertainty via posterior and posterior predictive distributions. Second, we propose a mechanistic generative approximation based on the odd-block decomposition: for odd $m$, write $3m+1=2^{K(m)}m'$ with $m'$ odd and $K(m)=v_2(3m+1)\ge 1$; randomizing these block lengths yields a stochastic approximation calibrated via a Dirichlet-multinomial update. On held-out data, the NB2-GLM achieves substantially higher predictive likelihood than the odd-block generators. Conditioning the block-length distribution on $m\bmod 8$ markedly improves the generator's distributional fit, indicating that low-order modular structure is a key driver of heterogeneity in $τ(n)$.

Bayesian Modeling of Collatz Stopping Times: A Probabilistic Machine Learning Perspective

TL;DR

Conditioning the block-length distribution on

markedly improves the generator's distributional fit, indicating that low-order modular structure is a key driver of heterogeneity in

Abstract

We study the Collatz total stopping time

over

from a probabilistic machine learning viewpoint. Empirically,

is a skewed and heavily overdispersed count with pronounced arithmetic heterogeneity. We develop two complementary models. First, a Bayesian hierarchical Negative Binomial regression (NB2-GLM) predicts

from simple covariates (

and residue class

), quantifying uncertainty via posterior and posterior predictive distributions. Second, we propose a mechanistic generative approximation based on the odd-block decomposition: for odd

, write

with

odd and

; randomizing these block lengths yields a stochastic approximation calibrated via a Dirichlet-multinomial update. On held-out data, the NB2-GLM achieves substantially higher predictive likelihood than the odd-block generators. Conditioning the block-length distribution on

markedly improves the generator's distributional fit, indicating that low-order modular structure is a key driver of heterogeneity in

Paper Structure (31 sections, 23 equations, 8 figures, 2 tables)

This paper contains 31 sections, 23 equations, 8 figures, 2 tables.

Introduction
Dataset and question.
Two complementary models.
A note on randomness (working likelihood).
Data and exploratory analysis
Method 1: Bayesian Negative Binomial regression
Method 2: A stochastic odd-block generative model
Definition of the odd projection $\left\lfloor \cdot \right\rceil_{\text{odd}}$.
Model comparison: predictive accuracy vs. mechanistic faithfulness
Interpretation.
Discussion
Method to construct the dataset
Reproducibility details (splits, priors, and evaluation)
Train/test protocol
Priors and MCMC settings for M3
...and 16 more sections

Figures (8)

Figure 1: Empirical distribution of $\tau(n)$ for $1\le n\le N$ (integer-aligned bins, width 2) with a KDE overlay computed on a large subsample to reduce noise. This motivates an overdispersed count likelihood.
Figure 2: Scatter of $\tau(n)$ vs. $n$ (log-$x$). The mean increases slowly and is approximately linear as a function of $\log n$, while the spread grows with $n$; banding suggests modular structure, motivating $\log n$ and $n\bmod 8$ as covariates.
Figure 3: Posterior predictive check for the hierarchical NB2-GLM (Model M3). The PPC matches the bulk well and mildly overestimates extreme right-tail mass.
Figure 4: Empirical block-length distribution $\hat{p}_k$ for $K=v_2(3m+1)$ (odd $m\le N$) vs. geometric reference $2^{-k}$ on log-$y$. This evaluates the "geometric $K$" heuristic.
Figure 5: Dirichlet posterior for $(p_k)$ (log scale) with uncertainty bars, compared to the geometric reference $2^{-k}$.
...and 3 more figures

Bayesian Modeling of Collatz Stopping Times: A Probabilistic Machine Learning Perspective

TL;DR

Abstract

Bayesian Modeling of Collatz Stopping Times: A Probabilistic Machine Learning Perspective

Authors

TL;DR

Abstract

Table of Contents

Figures (8)