Table of Contents
Fetching ...

Thermodynamic Response Functions in Singular Bayesian Models

Sean Plummer

Abstract

Singular statistical models-including mixtures, matrix factorization, and neural networks-violate regular asymptotics due to parameter non-identifiability and degenerate Fisher geometry. Although singular learning theory characterizes marginal likelihood behavior through invariants such as the real log canonical threshold and singular fluctuation, these quantities remain difficult to interpret operationally. At the same time, widely used criteria such as WAIC and WBIC appear disconnected from underlying singular geometry. We show that posterior tempering induces a one-parameter deformation of the posterior distribution whose associated observables generate a hierarchy of thermodynamic response functions. A universal covariance identity links derivatives of tempered expectations to posterior fluctuations, placing WAIC, WBIC, and singular fluctuation within a unified response framework. Within this framework, classical quantities from singular learning theory acquire natural thermodynamic interpretations: RLCT governs the leading free-energy slope, singular fluctuation corresponds to curvature of the tempered free energy, and WAIC measures predictive fluctuation. We formalize an observable algebra that quotients out non-identifiable directions, allowing structurally meaningful order parameters to be constructed in singular models. Across canonical singular examples-including symmetric Gaussian mixtures, reduced-rank regression, and overparameterized neural networks-we empirically demonstrate phase-transition-like behavior under tempering. Order parameters collapse, susceptibilities peak, and complexity measures align with structural reorganization in posterior geometry. Our results suggest that thermodynamic response theory provides a natural organizing framework for interpreting complexity, predictive variability, and structural reorganization in singular Bayesian learning.

Thermodynamic Response Functions in Singular Bayesian Models

Abstract

Singular statistical models-including mixtures, matrix factorization, and neural networks-violate regular asymptotics due to parameter non-identifiability and degenerate Fisher geometry. Although singular learning theory characterizes marginal likelihood behavior through invariants such as the real log canonical threshold and singular fluctuation, these quantities remain difficult to interpret operationally. At the same time, widely used criteria such as WAIC and WBIC appear disconnected from underlying singular geometry. We show that posterior tempering induces a one-parameter deformation of the posterior distribution whose associated observables generate a hierarchy of thermodynamic response functions. A universal covariance identity links derivatives of tempered expectations to posterior fluctuations, placing WAIC, WBIC, and singular fluctuation within a unified response framework. Within this framework, classical quantities from singular learning theory acquire natural thermodynamic interpretations: RLCT governs the leading free-energy slope, singular fluctuation corresponds to curvature of the tempered free energy, and WAIC measures predictive fluctuation. We formalize an observable algebra that quotients out non-identifiable directions, allowing structurally meaningful order parameters to be constructed in singular models. Across canonical singular examples-including symmetric Gaussian mixtures, reduced-rank regression, and overparameterized neural networks-we empirically demonstrate phase-transition-like behavior under tempering. Order parameters collapse, susceptibilities peak, and complexity measures align with structural reorganization in posterior geometry. Our results suggest that thermodynamic response theory provides a natural organizing framework for interpreting complexity, predictive variability, and structural reorganization in singular Bayesian learning.
Paper Structure (51 sections, 5 theorems, 48 equations, 4 figures)

This paper contains 51 sections, 5 theorems, 48 equations, 4 figures.

Key Result

Proposition 1

Let $f:\Theta\to\mathbb{R}$ be a measurable function.

Figures (4)

  • Figure 1: Order parameters and susceptibilities arise from the covariance identity. The response-speed bound links rates of change to fluctuation magnitudes and connects naturally to heat capacity.
  • Figure 2: Response hierarchy for the mixture symmetry-breaking experiment. The top panel shows the order parameter $m(\beta) = \mathbb{E}_{\beta}[|\mu|]$, which measures the posterior preference for one component mean over the symmetric configuration. At low inverse temperature the posterior explores both symmetric modes. As $\beta$ increases the posterior concentrates on one mode, producing spontaneous symmetry breaking. The middle panel shows the susceptibility $\chi(\beta) = \beta \mathrm{Var}(|\mu|)$, which peaks near the transition where the posterior fluctuates between symmetric configurations. The bottom panel shows the WAIC complexity $\log(1+p_{\mathrm{WAIC}}(\beta)/n)$. Predictive variance decreases as the posterior concentrates, indicating reduced predictive uncertainty once symmetry is broken. The vertical dashed line marks the temperature where susceptibility is maximal.
  • Figure 3: Response hierarchy for reduced-rank regression. The order parameter $m(\beta)=\mathbb{E}_\beta[s_2(B)]$ tracks the second singular value of the regression matrix. As $\beta$ increases, posterior concentration drives the second singular value toward zero, indicating collapse to a lower-rank model. The susceptibility $\chi(\beta)=\beta\mathrm{Var}(s_2)$ measures fluctuations in the effective rank and peaks near the temperature where rank collapse occurs. The WAIC complexity decreases as the posterior eliminates redundant directions in parameter space. The alignment between susceptibility and predictive complexity illustrates how singular structure controls predictive variability.
  • Figure 4: Response hierarchy for the neural network hidden-unit collapse experiment. The order parameter $m(\beta)=\mathbb{E}_\beta[N_{\mathrm{eff}}]$ measures the effective number of active hidden units. Although the network contains $H=10$ units, the posterior favors a smaller effective number as $\beta$ increases. Redundant hidden units become inactive due to symmetry and scaling degeneracies. The susceptibility $\chi(\beta)=\beta\mathrm{Var}(N_{\mathrm{eff}})$ peaks when multiple configurations with different numbers of active units coexist. This region corresponds to maximal posterior uncertainty over network representations. The WAIC complexity decreases as redundant units collapse, indicating that predictive uncertainty is highest when the network's internal representation is unstable.

Theorems & Definitions (9)

  • Proposition 1: Observable representation on $\mathcal{M}$
  • proof : Proof sketch
  • Proposition 2: Covariance identity
  • proof
  • Proposition 3: Response identities descend to the model image
  • proof : Proof sketch
  • Theorem 1: Thermodynamic response hierarchy
  • Proposition 4: Response-speed bound
  • proof