Table of Contents
Fetching ...

Gradient flow for deep equilibrium single-index models

Sanjit Dandapanthula, Aaditya Ramdas

TL;DR

This work analyzes gradient descent dynamics for deep equilibrium models (DEQs) in two simple but informative settings: linear targets and nonlinear single-index targets. By formulating DEQs via a fixed-point equation and applying implicit differentiation, the authors derive a conservation law for linear DEQs that confines training dynamics to spheres, ensuring the Jacobian remains well-conditioned and the model avoids degenerate fixed points. They prove exponential convergence of gradient flow to the global minimizer for linear DEQs and extend the results to nonlinear single-index models under mild activation and data assumptions, with gradient descent achieving linear convergence under appropriate initialization and step sizes. The theoretical findings are complemented by experiments that validate the predicted dynamics and convergence behavior, providing guidance for initialization and activation choices in practice. Overall, the paper offers rigorous insights into the stability and efficiency of training implicitly defined, infinitely deep networks.

Abstract

Deep equilibrium models (DEQs) have recently emerged as a powerful paradigm for training infinitely deep weight-tied neural networks that achieve state of the art performance across many modern machine learning tasks. Despite their practical success, theoretically understanding the gradient descent dynamics for training DEQs remains an area of active research. In this work, we rigorously study the gradient descent dynamics for DEQs in the simple setting of linear models and single-index models, filling several gaps in the literature. We prove a conservation law for linear DEQs which implies that the parameters remain trapped on spheres during training and use this property to show that gradient flow remains well-conditioned for all time. We then prove linear convergence of gradient descent to a global minimizer for linear DEQs and deep equilibrium single-index models under appropriate initialization and with a sufficiently small step size. Finally, we validate our theoretical findings through experiments.

Gradient flow for deep equilibrium single-index models

TL;DR

This work analyzes gradient descent dynamics for deep equilibrium models (DEQs) in two simple but informative settings: linear targets and nonlinear single-index targets. By formulating DEQs via a fixed-point equation and applying implicit differentiation, the authors derive a conservation law for linear DEQs that confines training dynamics to spheres, ensuring the Jacobian remains well-conditioned and the model avoids degenerate fixed points. They prove exponential convergence of gradient flow to the global minimizer for linear DEQs and extend the results to nonlinear single-index models under mild activation and data assumptions, with gradient descent achieving linear convergence under appropriate initialization and step sizes. The theoretical findings are complemented by experiments that validate the predicted dynamics and convergence behavior, providing guidance for initialization and activation choices in practice. Overall, the paper offers rigorous insights into the stability and efficiency of training implicitly defined, infinitely deep networks.

Abstract

Deep equilibrium models (DEQs) have recently emerged as a powerful paradigm for training infinitely deep weight-tied neural networks that achieve state of the art performance across many modern machine learning tasks. Despite their practical success, theoretically understanding the gradient descent dynamics for training DEQs remains an area of active research. In this work, we rigorously study the gradient descent dynamics for DEQs in the simple setting of linear models and single-index models, filling several gaps in the literature. We prove a conservation law for linear DEQs which implies that the parameters remain trapped on spheres during training and use this property to show that gradient flow remains well-conditioned for all time. We then prove linear convergence of gradient descent to a global minimizer for linear DEQs and deep equilibrium single-index models under appropriate initialization and with a sufficiently small step size. Finally, we validate our theoretical findings through experiments.

Paper Structure

This paper contains 20 sections, 10 theorems, 102 equations.

Key Result

Theorem 3.1

If $X \in L^2(\mathbb{P})$, then gradient flow for eq:linear-implicit-model satisfies the conservation law for all $t \geq 0$ (as long as the gradient flow is well-defined).

Theorems & Definitions (20)

  • Theorem 3.1: Gradient flow is trapped on spheres
  • proof
  • Lemma 3.2: Gradient flow is well-defined
  • proof
  • Theorem 3.3: Gradient flow converges for linear DEQs
  • proof
  • Theorem 3.4: Exponentially fast convergence for linear models
  • proof
  • Lemma 3.5: Descent lemma
  • Theorem 3.6: Gradient descent converges for linear DEQs
  • ...and 10 more