Table of Contents
Fetching ...

Deep Learning as a Convex Paradigm of Computation: Minimizing Circuit Size with ResNets

Arthur Jacot

TL;DR

The paper investigates why deep networks generalize so well by connecting real-valued function computation to circuit-size minimization. It introduces the HTMC norm $||f||_{H^{\gamma}}$ (for $\gamma>2$) and a ResNet-based complexity $||f||_{R^{\omega}}$, then proves a sandwich bound that links these two notions, suggesting that DNN optimization effectively performs near-minimal circuit-size search in a convex function-space regime. A key contribution is the HTMC convexity result and the construction of Tetrakis functions, which approximate HTMC ball vertices and enable a constructive RHS bound via ResNets. The work also provides PAC generalization guarantees in terms of the HTMC norm and formalizes a practical pathway to convex optimization for circuit-size minimization through ResNet architectures. Overall, the results offer a principled framework to view DNN training as implicitly solving minimal-circuit problems, with potential implications for convergence proofs and compositional learning.

Abstract

This paper argues that DNNs implement a computational Occam's razor -- finding the `simplest' algorithm that fits the data -- and that this could explain their incredible and wide-ranging success over more traditional statistical methods. We start with the discovery that the set of real-valued function $f$ that can be $ε$-approximated with a binary circuit of size at most $cε^{-γ}$ becomes convex in the `Harder than Monte Carlo' (HTMC) regime, when $γ>2$, allowing for the definition of a HTMC norm on functions. In parallel one can define a complexity measure on the parameters of a ResNets (a weighted $\ell_1$ norm of the parameters), which induce a `ResNet norm' on functions. The HTMC and ResNet norms can then be related by an almost matching sandwich bound. Thus minimizing this ResNet norm is equivalent to finding a circuit that fits the data with an almost minimal number of nodes (within a power of 2 of being optimal). ResNets thus appear as an alternative model for computation of real functions, better adapted to the HTMC regime and its convexity.

Deep Learning as a Convex Paradigm of Computation: Minimizing Circuit Size with ResNets

TL;DR

The paper investigates why deep networks generalize so well by connecting real-valued function computation to circuit-size minimization. It introduces the HTMC norm (for ) and a ResNet-based complexity , then proves a sandwich bound that links these two notions, suggesting that DNN optimization effectively performs near-minimal circuit-size search in a convex function-space regime. A key contribution is the HTMC convexity result and the construction of Tetrakis functions, which approximate HTMC ball vertices and enable a constructive RHS bound via ResNets. The work also provides PAC generalization guarantees in terms of the HTMC norm and formalizes a practical pathway to convex optimization for circuit-size minimization through ResNet architectures. Overall, the results offer a principled framework to view DNN training as implicitly solving minimal-circuit problems, with potential implications for convergence proofs and compositional learning.

Abstract

This paper argues that DNNs implement a computational Occam's razor -- finding the `simplest' algorithm that fits the data -- and that this could explain their incredible and wide-ranging success over more traditional statistical methods. We start with the discovery that the set of real-valued function that can be -approximated with a binary circuit of size at most becomes convex in the `Harder than Monte Carlo' (HTMC) regime, when , allowing for the definition of a HTMC norm on functions. In parallel one can define a complexity measure on the parameters of a ResNets (a weighted norm of the parameters), which induce a `ResNet norm' on functions. The HTMC and ResNet norms can then be related by an almost matching sandwich bound. Thus minimizing this ResNet norm is equivalent to finding a circuit that fits the data with an almost minimal number of nodes (within a power of 2 of being optimal). ResNets thus appear as an alternative model for computation of real functions, better adapted to the HTMC regime and its convexity.

Paper Structure

This paper contains 15 sections, 14 theorems, 86 equations, 1 figure.

Key Result

Theorem 1

For $\gamma>2$, there is a constant $c_{\gamma}$ such that for all $m\geq1$, $\left\Vert \sum_{i=1}^{m}f_{i}\right\Vert _{H^{\gamma}}\leq c_{\gamma}\sum_{i=1}^{m}\left\Vert f_{i}\right\Vert _{H^{\gamma}}$.

Figures (1)

  • Figure 1: The Tetrakis triangulation in 2D.

Theorems & Definitions (28)

  • Theorem 1
  • proof : Sketch of proof
  • Theorem 2
  • proof : Structure of the proof
  • Proposition 3
  • proof
  • Theorem 4
  • proof
  • Proposition 5
  • proof
  • ...and 18 more