Table of Contents
Fetching ...

Programs as Singularities

Daniel Murfet, Will Troiani

TL;DR

The paper constructs a structure-preserving bridge between discrete Turing machines and the geometry of real-analytic singularities by embedding TM codes into a smooth parameter space of noisy codes and associating each TM with a germ $([M],L)$, where $L$ is the average negative log-likelihood. It then shows that the Taylor expansion of the accompanying polynomial $H$ encodes the combinatorics of error syndromes, linking local geometry to the internal structure of programs, and connects this to Bayesian inference via the local learning coefficient $ ext{RLCT}_{W_{[M]}}(K;oldsymbol{ ho})$ and Watanabe’s free energy framework. The work demonstrates how features of TM design—such as runtime error correction and control-flow modularity—shape the Hessian and the spectrum of $H$, with explicit analysis on the detectA family illustrating nondegenerate and degenerate directions that reflect internal structure. A central takeaway is a philosophical shift toward structural Bayesianism: the posterior distribution not only selects for predictive accuracy but also for internal algorithmic organization, suggesting a principled path toward interpretability in singular models and neural networks alike.

Abstract

We develop a correspondence between the structure of Turing machines and the structure of singularities of real analytic functions, based on connecting the Ehrhard-Regnier derivative from linear logic with the role of geometry in Watanabe's singular learning theory. The correspondence works by embedding ordinary (discrete) Turing machine codes into a family of noisy codes which form a smooth parameter space. On this parameter space we consider a potential function which has Turing machines as critical points. By relating the Taylor series expansion of this potential at such a critical point to combinatorics of error syndromes, we relate the local geometry to internal structure of the Turing machine. The potential in question is the negative log-likelihood for a statistical model, so that the structure of the Turing machine and its associated singularity is further related to Bayesian inference. Two algorithms that produce the same predictive function can nonetheless correspond to singularities with different geometries, which implies that the Bayesian posterior can discriminate between distinct algorithmic implementations, contrary to a purely functional view of inference. In the context of singular learning theory our results point to a more nuanced understanding of Occam's razor and the meaning of simplicity in inductive inference.

Programs as Singularities

TL;DR

The paper constructs a structure-preserving bridge between discrete Turing machines and the geometry of real-analytic singularities by embedding TM codes into a smooth parameter space of noisy codes and associating each TM with a germ , where is the average negative log-likelihood. It then shows that the Taylor expansion of the accompanying polynomial encodes the combinatorics of error syndromes, linking local geometry to the internal structure of programs, and connects this to Bayesian inference via the local learning coefficient and Watanabe’s free energy framework. The work demonstrates how features of TM design—such as runtime error correction and control-flow modularity—shape the Hessian and the spectrum of , with explicit analysis on the detectA family illustrating nondegenerate and degenerate directions that reflect internal structure. A central takeaway is a philosophical shift toward structural Bayesianism: the posterior distribution not only selects for predictive accuracy but also for internal algorithmic organization, suggesting a principled path toward interpretability in singular models and neural networks alike.

Abstract

We develop a correspondence between the structure of Turing machines and the structure of singularities of real analytic functions, based on connecting the Ehrhard-Regnier derivative from linear logic with the role of geometry in Watanabe's singular learning theory. The correspondence works by embedding ordinary (discrete) Turing machine codes into a family of noisy codes which form a smooth parameter space. On this parameter space we consider a potential function which has Turing machines as critical points. By relating the Taylor series expansion of this potential at such a critical point to combinatorics of error syndromes, we relate the local geometry to internal structure of the Turing machine. The potential in question is the negative log-likelihood for a statistical model, so that the structure of the Turing machine and its associated singularity is further related to Bayesian inference. Two algorithms that produce the same predictive function can nonetheless correspond to singularities with different geometries, which implies that the Bayesian posterior can discriminate between distinct algorithmic implementations, contrary to a purely functional view of inference. In the context of singular learning theory our results point to a more nuanced understanding of Occam's razor and the meaning of simplicity in inductive inference.

Paper Structure

This paper contains 62 sections, 26 theorems, 267 equations, 17 figures, 7 tables.

Key Result

Lemma 2.11

If the true function is realisable by $w^*$ then the KL divergence is comparable to the polynomial $\tfrac{1}{2} H(w)$ in some open neighbourhood of $w^*$. In particular, the average log-likelihood $L_\mu(w)$ is comparable to $\tfrac{1}{2} H(w) + L^\mu_0$ where $L^\mu_0$ is a constant, the entropy of the true distribution $q_\mu$.

Figures (17)

  • Figure 1: The parameter space $W$ with coordinates $w_1,\ldots,w_d$ parametrises "noisy codes" which specify for each entry on the description tape of a UTM a distribution over possible values. When executing such a noisy Turing machine code on an input $x$ for $t$ steps of the simulated machine the UTM can experience read errors when accessing the code on the description tape, and the pattern of such errors determines an error syndrome. Any particular error syndrome may flip the output of the simulated machine from the correct output $y = y(x)$ to the incorrect output $\overline{y(x)}$.
  • Figure 2: The relationship between program structure and geometry we establish works by relating both to the combinatorics of error syndromes, which are patterns of flips in bits on the description tape of a UTM which affect the output of the simulated machine.
  • Figure 3: The plain proof $\psi$ has $r$ inputs, and $n_i$ computation paths leading from the $i$th input. In this depiction of an error syndrome $\gamma$, integers $1 \le j \le p_i$ are represented as colours with a different set of colours (proofs) for each $i$. Here for example $\gamma_1$ is the function corresponding to the integer sequence $1,2,1,2,1,0,2,2,0$. The corresponding factor in the monomial associated to $\gamma$ in $f^\tau_\psi$ is $(x^1_0)^2(x^1_1)^3(x^1_2)^4$ with $x^i_0$ factors being associated to "no error" (the integer $0$). Note that different orderings of the integer sequence represent distinct error syndromes but contribute the same monomial; this is the origin of coefficients in $f^\tau_\psi$ other than $1$.
  • Figure 4: The staged pseudo-UTM $\mathcal{U}$. The nodes, except for go to compSymbol, are states of $\mathcal{U}$ and an arrow $q \to q'$ has the following interpretation: if the UTM is in state $q$ and sees the tape symbols (on the four tapes) as indicated by the source of the arrow, then the UTM transitions to state $q'$, writes the indicated symbols (or if there is no write instruction, simply rewrites the same symbols back onto the tapes), and performs the indicated movements of each of the tape heads with $R, S, L$ standing for Right, Stay, Left. The symbols $a,c,d,e$ stand for generic symbols which are not $X$, and $b$ stands for a generic symbol (which may be $X$).
  • Figure 5: Directed graphical model for a cycle of $\mathcal{U}$. Shown is a complete cycle of the staged pseudo-UTM $\mathcal{U}$. Vertices represent random variables and arrows show the dependence relations. Columns are respectively random variables representing the squares on the description tape (column $1$, counting from the left), staging tape (columns $2-4$), state tape (column $5$), work tape squares, UTM state (second column from the right) and the timestep $\mu$ of the UTM within the cycle (rightmost column). Shown for reference is a random variable $\theta$ corresponding to a $\sigma'$ entry on the description tape.
  • ...and 12 more figures

Theorems & Definitions (103)

  • Definition 2.1
  • Definition 2.2
  • Definition 2.3
  • Definition 2.4
  • Definition 2.5
  • Definition 2.6
  • Remark 2.7
  • Definition 2.8
  • Definition 2.9
  • Definition 2.10
  • ...and 93 more