Table of Contents
Fetching ...

Loss-Complexity Landscape and Model Structure Functions

Alexander Kolpakov

TL;DR

The paper tackles the difficulty of the Kolmogorov structure function by introducing computable proxies and casting the loss–complexity trade-off as a free-energy optimization problem. It develops a Legendre–Fenchel dual framework, connects to a statistical-mechanics partition function, and uses Metropolis–Hastings with simulated annealing to approximate the structure function and its dual. A novel information–scattering analogy and a susceptibility-based resonance analysis predict phase-transition-like elbows in model selection, reflecting critical loss–complexity trade-offs. Numerical experiments across linear, tree-based, and deep neural network models validate the theory and demonstrate practical model-selection pathways using Bayesian optimizers. The work provides a rigorous, computable lens on generalization and overfitting that can inform principled hyperparameter tuning and architecture choices, with accessible code and reproducible experiments.

Abstract

We develop a framework for dualizing the Kolmogorov structure function $h_x(α)$, which then allows using computable complexity proxies. We establish a mathematical analogy between information-theoretic constructs and statistical mechanics, introducing a suitable partition function and free energy functional. We explicitly prove the Legendre-Fenchel duality between the structure function and free energy, showing detailed balance of the Metropolis kernel, and interpret acceptance probabilities as information-theoretic scattering amplitudes. A susceptibility-like variance of model complexity is shown to peak precisely at loss-complexity trade-offs interpreted as phase transitions. Practical experiments with linear and tree-based regression models verify these theoretical predictions, explicitly demonstrating the interplay between the model complexity, generalization, and overfitting threshold.

Loss-Complexity Landscape and Model Structure Functions

TL;DR

The paper tackles the difficulty of the Kolmogorov structure function by introducing computable proxies and casting the loss–complexity trade-off as a free-energy optimization problem. It develops a Legendre–Fenchel dual framework, connects to a statistical-mechanics partition function, and uses Metropolis–Hastings with simulated annealing to approximate the structure function and its dual. A novel information–scattering analogy and a susceptibility-based resonance analysis predict phase-transition-like elbows in model selection, reflecting critical loss–complexity trade-offs. Numerical experiments across linear, tree-based, and deep neural network models validate the theory and demonstrate practical model-selection pathways using Bayesian optimizers. The work provides a rigorous, computable lens on generalization and overfitting that can inform principled hyperparameter tuning and architecture choices, with accessible code and reproducible experiments.

Abstract

We develop a framework for dualizing the Kolmogorov structure function , which then allows using computable complexity proxies. We establish a mathematical analogy between information-theoretic constructs and statistical mechanics, introducing a suitable partition function and free energy functional. We explicitly prove the Legendre-Fenchel duality between the structure function and free energy, showing detailed balance of the Metropolis kernel, and interpret acceptance probabilities as information-theoretic scattering amplitudes. A susceptibility-like variance of model complexity is shown to peak precisely at loss-complexity trade-offs interpreted as phase transitions. Practical experiments with linear and tree-based regression models verify these theoretical predictions, explicitly demonstrating the interplay between the model complexity, generalization, and overfitting threshold.

Paper Structure

This paper contains 24 sections, 4 theorems, 63 equations, 11 figures, 2 algorithms.

Key Result

Theorem 4.1

The functions $\phi(\alpha)$ and $F(\lambda)$ are Legendre–Fenchel duals:

Figures (11)

  • Figure 1: Loss vs. Complexity for polynomial regression on $f(x)=\sin(6\pi x) + \varepsilon$ with Gaussian noise $\varepsilon \sim N(0, \sigma^2)$
  • Figure 2: Polynomial regression for $f(x)=\sin(6\pi x) + \varepsilon$ with Gaussian noise $\varepsilon \sim N(0, \sigma^2)$ and varying polynomial degrees $d$. Note that the Loss vs. Complexity curve has "elbows" at $d=7$ and $d=9$. There are visible "phase transitions" in the shape of the polynomial vs the data at $d = 5, 7, 9, 11$, while in between these values the regression curve shape stays relatively the same, and tends to stabilize after $d=11$.
  • Figure 3: Polynomial regression for $f(x)=\sin(6\pi x) + \varepsilon$ with Gaussian noise $\varepsilon \sim N(0, \sigma^2)$ and varying polynomial degrees $d$. Note that the Loss vs. Complexity curve has "elbows" at $d=7$ and $d=9$. There are visible "phase transitions" in the shape of the polynomial vs the data at $d = 5, 7, 9, 11$, while in between these values the regression curve shape stays relatively the same, and tends to stabilize after $d=11$.
  • Figure 4: Loss vs. Complexity for polynomial regression on $f(x)=\sin(4\pi x) + \varepsilon$ with Gaussian noise $\varepsilon \sim N(0, \sigma^2)$
  • Figure 5: Loss vs. Complexity for a decision tree regressor on $f(x)=\sin(4\pi x) + \varepsilon$ with Gaussian noise $\varepsilon \sim N(0, \sigma^2)$. Here $\sigma=0.05$, a low noise level case.
  • ...and 6 more figures

Theorems & Definitions (8)

  • Theorem 4.1: Legendre–Fenchel Duality
  • proof
  • Lemma 6.1: Detailed Balance for Metropolis Algorithm
  • proof
  • Theorem 8.1: Acceptance as Scattering Amplitude
  • proof
  • Theorem 9.1: Susceptibility Resonance
  • proof