A Geometrically-Grounded Drive for MDL-Based Optimization in Deep Learning

Ming Lei; Shufan Wu; Christophe Baehr

A Geometrically-Grounded Drive for MDL-Based Optimization in Deep Learning

Ming Lei, Shufan Wu, Christophe Baehr

Abstract

This paper introduces a novel optimization framework that fundamentally integrates the Minimum Description Length (MDL) principle into the training dynamics of deep neural networks. Moving beyond its conventional role as a model selection criterion, we reformulate MDL as an active, adaptive driving force within the optimization process itself. The core of our method is a geometrically-grounded cognitive manifold whose evolution is governed by a \textit{coupled Ricci flow}, enriched with a novel \textit{MDL Drive} term derived from first principles. This drive, modulated by the task-loss gradient, creates a seamless harmony between data fidelity and model simplification, actively compressing the internal representation during training. We establish a comprehensive theoretical foundation, proving key properties including the monotonic decrease of description length (Theorem~\ref{thm:convergence}), a finite number of topological phase transitions via a geometric surgery protocol (Theorems~\ref{thm:surgery}, \ref{thm:ultimate_fate}), and the emergence of universal critical behavior (Theorem~\ref{thm:universality}). Furthermore, we provide a practical, computationally efficient algorithm with $O(N \log N)$ per-iteration complexity (Theorem~\ref{thm:complexity}), alongside guarantees for numerical stability (Theorem~\ref{thm:stability}) and exponential convergence under convexity assumptions (Theorem~\ref{thm:convergence_rate}). Empirical validation on synthetic regression and classification tasks confirms the theoretical predictions, demonstrating the algorithm's efficacy in achieving robust generalization and autonomous model simplification. This work provides a principled path toward more autonomous, generalizable, and interpretable AI systems by unifying geometric deep learning with information-theoretic principles.

A Geometrically-Grounded Drive for MDL-Based Optimization in Deep Learning

Abstract

per-iteration complexity (Theorem~\ref{thm:complexity}), alongside guarantees for numerical stability (Theorem~\ref{thm:stability}) and exponential convergence under convexity assumptions (Theorem~\ref{thm:convergence_rate}). Empirical validation on synthetic regression and classification tasks confirms the theoretical predictions, demonstrating the algorithm's efficacy in achieving robust generalization and autonomous model simplification. This work provides a principled path toward more autonomous, generalizable, and interpretable AI systems by unifying geometric deep learning with information-theoretic principles.

Paper Structure (20 sections, 8 theorems, 16 equations, 1 figure, 1 algorithm)

This paper contains 20 sections, 8 theorems, 16 equations, 1 figure, 1 algorithm.

Introduction
Related Work
Geometric Deep Learning and Learning on Manifolds
The Minimum Description Length Principle in Optimization
Ricci Flow and Its Applications in Machine Learning
Formal Methods and AI Safety
Theoretical Framework
Core Definitions
The Dynamics: Axiom of the MDL Drive
Main Theoretical Results
Algorithm
Algorithmic Implementation
Theoretical Performance Analysis
Discussion of Performance Results
Simulation Verification
...and 5 more sections

Key Result

Theorem 4.1

Let $\mathcal{M}(t)$ be a solution to the flow defined in Axiom axiom:mdl-drive. Then, the time derivative of the description length functional is non-positive almost everywhere: Furthermore, the inequality is strict, $\frac{d}{dt} L_M(\mathcal{M}(t)) < 0$, whenever the functional gradient $\frac{\delta}{\delta \mathbf{g}}{L_M}{\mathbf{g}}$ is non-vanishing and the adaptive weight $\eta(t) > 0$.

Figures (1)

Figure 1: Analysis of the MDL-driven optimization process for the polynomial regression case study. (a) Final fit compared to noisy data and ground truth. (b) Monotonic decrease of the task loss $\mathcal{L}$. (c) Monotonic decrease of the description length $L_M$, validating Theorem \ref{['thm:convergence']}. (d) Evolution of model parameters $\theta$ showing convergence. (e) Frobenius norm of the metric tensor $\mathbf{g}$, indicating the evolution of the cognitive manifold's geometry. (f) Smoothed Ricci curvature $R$ over time, exhibiting stability. (g) Norm of the natural gradient $G^{-1}\nabla_\theta\mathcal{L}$. (h) Heatmap of the final metric matrix $\mathbf{g}(6000)$, revealing a structured geometry. (i) Absolute change in the metric from its initial state $|\mathbf{g}(6000) - \mathbf{I}_4|$.

Theorems & Definitions (19)

Definition 1: Cognitive Manifold $\mathcal{M}$
Definition 2: Description Length Functional $L_M$
Definition 3: Adaptive Weights $\eta(t)$, $\kappa(t)$
Theorem 4.1: Monotonicity of Description Length
proof : Proof Sketch
Theorem 4.2: Computational Complexity
proof : Proof Sketch
Theorem 4.3: Necessity of Surgery for Topological Change
proof : Proof Sketch
Theorem 4.4: Emergence of Critical Slowing Down
...and 9 more

A Geometrically-Grounded Drive for MDL-Based Optimization in Deep Learning

Abstract

A Geometrically-Grounded Drive for MDL-Based Optimization in Deep Learning

Authors

Abstract

Table of Contents

Key Result

Figures (1)

Theorems & Definitions (19)