Stabilizing Equilibrium Models by Jacobian Regularization
Shaojie Bai, Vladlen Koltun, J. Zico Kolter
TL;DR
This paper addresses instability and inefficiency in Deep Equilibrium Networks (DEQs) by introducing a Jacobian-based regularization that targets the conditioning of the DEQ's forward and backward dynamics. By penalizing the Jacobian at the equilibrium via a Hutchinson-based Frobenius-norm surrogate, the approach stabilizes convergence, reduces the required fixed-point iterations, and allows DEQs to scale to large tasks with competitive accuracy and constant memory. Empirical results on synthetic data, WikiText-103, CIFAR-10, and ImageNet demonstrate significant speedups (2x–3x) and improved robustness to architectural choices, making implicit-depth models more practical. The method preserves the memory advantages of DEQs and narrows the gap with explicit architectures, though it does not completely eliminate instability and requires careful scheduling of the regularization strength.
Abstract
Deep equilibrium networks (DEQs) are a new class of models that eschews traditional depth in favor of finding the fixed point of a single nonlinear layer. These models have been shown to achieve performance competitive with the state-of-the-art deep networks while using significantly less memory. Yet they are also slower, brittle to architectural choices, and introduce potential instability to the model. In this paper, we propose a regularization scheme for DEQ models that explicitly regularizes the Jacobian of the fixed-point update equations to stabilize the learning of equilibrium models. We show that this regularization adds only minimal computational cost, significantly stabilizes the fixed-point convergence in both forward and backward passes, and scales well to high-dimensional, realistic domains (e.g., WikiText-103 language modeling and ImageNet classification). Using this method, we demonstrate, for the first time, an implicit-depth model that runs with approximately the same speed and level of performance as popular conventional deep networks such as ResNet-101, while still maintaining the constant memory footprint and architectural simplicity of DEQs. Code is available at https://github.com/locuslab/deq .
