Table of Contents
Fetching ...

Stabilizing Equilibrium Models by Jacobian Regularization

Shaojie Bai, Vladlen Koltun, J. Zico Kolter

TL;DR

This paper addresses instability and inefficiency in Deep Equilibrium Networks (DEQs) by introducing a Jacobian-based regularization that targets the conditioning of the DEQ's forward and backward dynamics. By penalizing the Jacobian at the equilibrium via a Hutchinson-based Frobenius-norm surrogate, the approach stabilizes convergence, reduces the required fixed-point iterations, and allows DEQs to scale to large tasks with competitive accuracy and constant memory. Empirical results on synthetic data, WikiText-103, CIFAR-10, and ImageNet demonstrate significant speedups (2x–3x) and improved robustness to architectural choices, making implicit-depth models more practical. The method preserves the memory advantages of DEQs and narrows the gap with explicit architectures, though it does not completely eliminate instability and requires careful scheduling of the regularization strength.

Abstract

Deep equilibrium networks (DEQs) are a new class of models that eschews traditional depth in favor of finding the fixed point of a single nonlinear layer. These models have been shown to achieve performance competitive with the state-of-the-art deep networks while using significantly less memory. Yet they are also slower, brittle to architectural choices, and introduce potential instability to the model. In this paper, we propose a regularization scheme for DEQ models that explicitly regularizes the Jacobian of the fixed-point update equations to stabilize the learning of equilibrium models. We show that this regularization adds only minimal computational cost, significantly stabilizes the fixed-point convergence in both forward and backward passes, and scales well to high-dimensional, realistic domains (e.g., WikiText-103 language modeling and ImageNet classification). Using this method, we demonstrate, for the first time, an implicit-depth model that runs with approximately the same speed and level of performance as popular conventional deep networks such as ResNet-101, while still maintaining the constant memory footprint and architectural simplicity of DEQs. Code is available at https://github.com/locuslab/deq .

Stabilizing Equilibrium Models by Jacobian Regularization

TL;DR

This paper addresses instability and inefficiency in Deep Equilibrium Networks (DEQs) by introducing a Jacobian-based regularization that targets the conditioning of the DEQ's forward and backward dynamics. By penalizing the Jacobian at the equilibrium via a Hutchinson-based Frobenius-norm surrogate, the approach stabilizes convergence, reduces the required fixed-point iterations, and allows DEQs to scale to large tasks with competitive accuracy and constant memory. Empirical results on synthetic data, WikiText-103, CIFAR-10, and ImageNet demonstrate significant speedups (2x–3x) and improved robustness to architectural choices, making implicit-depth models more practical. The method preserves the memory advantages of DEQs and narrows the gap with explicit architectures, though it does not completely eliminate instability and requires careful scheduling of the regularization strength.

Abstract

Deep equilibrium networks (DEQs) are a new class of models that eschews traditional depth in favor of finding the fixed point of a single nonlinear layer. These models have been shown to achieve performance competitive with the state-of-the-art deep networks while using significantly less memory. Yet they are also slower, brittle to architectural choices, and introduce potential instability to the model. In this paper, we propose a regularization scheme for DEQ models that explicitly regularizes the Jacobian of the fixed-point update equations to stabilize the learning of equilibrium models. We show that this regularization adds only minimal computational cost, significantly stabilizes the fixed-point convergence in both forward and backward passes, and scales well to high-dimensional, realistic domains (e.g., WikiText-103 language modeling and ImageNet classification). Using this method, we demonstrate, for the first time, an implicit-depth model that runs with approximately the same speed and level of performance as popular conventional deep networks such as ResNet-101, while still maintaining the constant memory footprint and architectural simplicity of DEQs. Code is available at https://github.com/locuslab/deq .

Paper Structure

This paper contains 30 sections, 13 equations, 10 figures, 5 tables.

Figures (10)

  • Figure 1: Visualizations of DEQs' instablity and inefficiency problems.
  • Figure 2: Pre- vs. post-LN DEQ-Transformer layer xiong2020layer. FFN is a 2-layer feed-forward block vaswani2017attention.
  • Figure 3: Comparing different architectural modifications of a DEQ-Transformer (first 60K steps). The DEQ networks are brittle: even slight modifications such as changing the whereabouts of LayerNorm (see Figure \ref{['fig:pre-postln']}) or removing weight normalization can cause the model to quickly diverge during training.
  • Figure 4: Left: when the slope is less than 1, even the simplest iterative application of $f_\theta$ converges. Right: when slope $>1$, the iterative approach may diverge or oscillate, but the fixed point still exists and can be solved for.
  • Figure 5: Top: the surface of the $f_\theta(\mathbf{z};\mathbf{x})$ layer, and the eventual learned equilibria $z^\star(x)$ as a function of $x$. As $\gamma$ grows, the surface is "lifted up" and becomes flat in the $z$-direction. Bottom: each unique input $x$ defines a slice of the surface, and we perform fixed-point solving on this slice; larger $\gamma$ values flatten the curve and significantly accelerate the convergence to equilibrium.
  • ...and 5 more figures