Deep Equilibrium Models are Almost Equivalent to Not-so-deep Explicit Models for High-dimensional Gaussian Mixtures
Zenan Ling, Longbo Li, Zhanbo Feng, Yixuan Zhang, Feng Zhou, Robert C. Qiu, Zhenyu Liao
TL;DR
This work uses random matrix theory to analyze the spectral behavior of CK and NTK in Deep Equilibrium Models under high-dimensional Gaussian mixtures. It shows that the CK/NTK depend on the activation and variance through a small set of nonlinear equations, enabling a principled route to create equivalent shallow explicit networks via CK/NTK matching. The authors provide a practical recipe to design activations for explicit nets that replicate the DEQ spectral properties, validated on Gaussian mixtures and real datasets such as MNIST, Fashion-MNIST, and CIFAR-10. Overall, the paper establishes a high-dimensional equivalence between implicit DEQs and shallow explicit networks, offering substantial potential computational savings and guidance for network design.
Abstract
Deep equilibrium models (DEQs), as a typical implicit neural network, have demonstrated remarkable success on various tasks. There is, however, a lack of theoretical understanding of the connections and differences between implicit DEQs and explicit neural network models. In this paper, leveraging recent advances in random matrix theory (RMT), we perform an in-depth analysis on the eigenspectra of the conjugate kernel (CK) and neural tangent kernel (NTK) matrices for implicit DEQs, when the input data are drawn from a high-dimensional Gaussian mixture. We prove, in this setting, that the spectral behavior of these Implicit-CKs and NTKs depend on the DEQ activation function and initial weight variances, but only via a system of four nonlinear equations. As a direct consequence of this theoretical result, we demonstrate that a shallow explicit network can be carefully designed to produce the same CK or NTK as a given DEQ. Despite derived here for Gaussian mixture data, empirical results show the proposed theory and design principle also apply to popular real-world datasets.
