Understanding Representation of Deep Equilibrium Models from Neural Collapse Perspective
Haixiang Sun, Ye Shi
TL;DR
This work analyzes Deep Equilibrium Models (DEQ) through Neural Collapse (NC) to understand their representation capacity. It proves NC occurs for DEQ in balanced data and shows that, under mild conditions in imbalanced settings, DEQ features converge toward the vertices of a $\text{Simplex ETF}$ with self-duality, offering advantages over explicit networks. The authors employ a layer-peeled framework to compare DEQ and explicit last-layer mappings, deriving lower bounds on training loss and NC-aligned structure, and validate these results with CIFAR-10/100 experiments in balanced and imbalanced regimes. The findings shed light on why DEQ can provide competitive or superior representation and robustness to class imbalance, while also outlining limitations and directions for extending the theory to broader DEQ architectures and more complex imbalance scenarios.
Abstract
Deep Equilibrium Model (DEQ), which serves as a typical implicit neural network, emphasizes their memory efficiency and competitive performance compared to explicit neural networks. However, there has been relatively limited theoretical analysis on the representation of DEQ. In this paper, we utilize the Neural Collapse ($\mathcal{NC}$) as a tool to systematically analyze the representation of DEQ under both balanced and imbalanced conditions. $\mathcal{NC}$ is an interesting phenomenon in the neural network training process that characterizes the geometry of class features and classifier weights. While extensively studied in traditional explicit neural networks, the $\mathcal{NC}$ phenomenon has not received substantial attention in the context of implicit neural networks. We theoretically show that $\mathcal{NC}$ exists in DEQ under balanced conditions. Moreover, in imbalanced settings, despite the presence of minority collapse, DEQ demonstrated advantages over explicit neural networks. These advantages include the convergence of extracted features to the vertices of a simplex equiangular tight frame and self-duality properties under mild conditions, highlighting DEQ's superiority in handling imbalanced datasets. Finally, we validate our theoretical analyses through experiments in both balanced and imbalanced scenarios.
