Reversible Deep Equilibrium Models
Sam McCallum, Kamran Arora, James Foster
TL;DR
This work tackles instability and gradient-approximation challenges in Deep Equilibrium Models (DEQs) by introducing Reversible Deep Equilibrium Models (RevDEQs). RevDEQs use an algebraically reversible fixed-point solver to enable exact gradients with constant memory, reducing function evaluations and regularisation needs. Empirical results on language modelling (Wikitext-103) and image classification (CIFAR-10) show RevDEQs outperform existing implicit and competitive explicit baselines, sometimes matching or exceeding strong ResNet/Transformer performance with far fewer evaluations. The approach highlights the practical potential of reversible implicit depth, suggesting broad applicability and improved GPU efficiency for large-scale tasks.
Abstract
Deep Equilibrium Models (DEQs) are an interesting class of implicit model where the model output is implicitly defined as the fixed point of a learned function. These models have been shown to outperform explicit (fixed-depth) models in large-scale tasks by trading many deep layers for a single layer that is iterated many times. However, gradient calculation through DEQs is approximate. This often leads to unstable training dynamics and requires regularisation or many function evaluations to fix. Here, we introduce Reversible Deep Equilibrium Models (RevDEQs) that allow for exact gradient calculation, no regularisation and far fewer function evaluations than DEQs. We show that RevDEQs significantly improve performance on language modelling and image classification tasks against comparable implicit and explicit models.
