Table of Contents
Fetching ...

A Geometrically-Grounded Drive for MDL-Based Optimization in Deep Learning

Ming Lei, Shufan Wu, Christophe Baehr

Abstract

This paper introduces a novel optimization framework that fundamentally integrates the Minimum Description Length (MDL) principle into the training dynamics of deep neural networks. Moving beyond its conventional role as a model selection criterion, we reformulate MDL as an active, adaptive driving force within the optimization process itself. The core of our method is a geometrically-grounded cognitive manifold whose evolution is governed by a \textit{coupled Ricci flow}, enriched with a novel \textit{MDL Drive} term derived from first principles. This drive, modulated by the task-loss gradient, creates a seamless harmony between data fidelity and model simplification, actively compressing the internal representation during training. We establish a comprehensive theoretical foundation, proving key properties including the monotonic decrease of description length (Theorem~\ref{thm:convergence}), a finite number of topological phase transitions via a geometric surgery protocol (Theorems~\ref{thm:surgery}, \ref{thm:ultimate_fate}), and the emergence of universal critical behavior (Theorem~\ref{thm:universality}). Furthermore, we provide a practical, computationally efficient algorithm with $O(N \log N)$ per-iteration complexity (Theorem~\ref{thm:complexity}), alongside guarantees for numerical stability (Theorem~\ref{thm:stability}) and exponential convergence under convexity assumptions (Theorem~\ref{thm:convergence_rate}). Empirical validation on synthetic regression and classification tasks confirms the theoretical predictions, demonstrating the algorithm's efficacy in achieving robust generalization and autonomous model simplification. This work provides a principled path toward more autonomous, generalizable, and interpretable AI systems by unifying geometric deep learning with information-theoretic principles.

A Geometrically-Grounded Drive for MDL-Based Optimization in Deep Learning

Abstract

This paper introduces a novel optimization framework that fundamentally integrates the Minimum Description Length (MDL) principle into the training dynamics of deep neural networks. Moving beyond its conventional role as a model selection criterion, we reformulate MDL as an active, adaptive driving force within the optimization process itself. The core of our method is a geometrically-grounded cognitive manifold whose evolution is governed by a \textit{coupled Ricci flow}, enriched with a novel \textit{MDL Drive} term derived from first principles. This drive, modulated by the task-loss gradient, creates a seamless harmony between data fidelity and model simplification, actively compressing the internal representation during training. We establish a comprehensive theoretical foundation, proving key properties including the monotonic decrease of description length (Theorem~\ref{thm:convergence}), a finite number of topological phase transitions via a geometric surgery protocol (Theorems~\ref{thm:surgery}, \ref{thm:ultimate_fate}), and the emergence of universal critical behavior (Theorem~\ref{thm:universality}). Furthermore, we provide a practical, computationally efficient algorithm with per-iteration complexity (Theorem~\ref{thm:complexity}), alongside guarantees for numerical stability (Theorem~\ref{thm:stability}) and exponential convergence under convexity assumptions (Theorem~\ref{thm:convergence_rate}). Empirical validation on synthetic regression and classification tasks confirms the theoretical predictions, demonstrating the algorithm's efficacy in achieving robust generalization and autonomous model simplification. This work provides a principled path toward more autonomous, generalizable, and interpretable AI systems by unifying geometric deep learning with information-theoretic principles.
Paper Structure (20 sections, 8 theorems, 16 equations, 1 figure, 1 algorithm)

This paper contains 20 sections, 8 theorems, 16 equations, 1 figure, 1 algorithm.

Key Result

Theorem 4.1

Let $\mathcal{M}(t)$ be a solution to the flow defined in Axiom axiom:mdl-drive. Then, the time derivative of the description length functional is non-positive almost everywhere: Furthermore, the inequality is strict, $\frac{d}{dt} L_M(\mathcal{M}(t)) < 0$, whenever the functional gradient $\frac{\delta}{\delta \mathbf{g}}{L_M}{\mathbf{g}}$ is non-vanishing and the adaptive weight $\eta(t) > 0$.

Figures (1)

  • Figure 1: Analysis of the MDL-driven optimization process for the polynomial regression case study. (a) Final fit compared to noisy data and ground truth. (b) Monotonic decrease of the task loss $\mathcal{L}$. (c) Monotonic decrease of the description length $L_M$, validating Theorem \ref{['thm:convergence']}. (d) Evolution of model parameters $\theta$ showing convergence. (e) Frobenius norm of the metric tensor $\mathbf{g}$, indicating the evolution of the cognitive manifold's geometry. (f) Smoothed Ricci curvature $R$ over time, exhibiting stability. (g) Norm of the natural gradient $G^{-1}\nabla_\theta\mathcal{L}$. (h) Heatmap of the final metric matrix $\mathbf{g}(6000)$, revealing a structured geometry. (i) Absolute change in the metric from its initial state $|\mathbf{g}(6000) - \mathbf{I}_4|$.

Theorems & Definitions (19)

  • Definition 1: Cognitive Manifold $\mathcal{M}$
  • Definition 2: Description Length Functional $L_M$
  • Definition 3: Adaptive Weights $\eta(t)$, $\kappa(t)$
  • Theorem 4.1: Monotonicity of Description Length
  • proof : Proof Sketch
  • Theorem 4.2: Computational Complexity
  • proof : Proof Sketch
  • Theorem 4.3: Necessity of Surgery for Topological Change
  • proof : Proof Sketch
  • Theorem 4.4: Emergence of Critical Slowing Down
  • ...and 9 more