Table of Contents
Fetching ...

Understanding Representation of Deep Equilibrium Models from Neural Collapse Perspective

Haixiang Sun, Ye Shi

TL;DR

This work analyzes Deep Equilibrium Models (DEQ) through Neural Collapse (NC) to understand their representation capacity. It proves NC occurs for DEQ in balanced data and shows that, under mild conditions in imbalanced settings, DEQ features converge toward the vertices of a $\text{Simplex ETF}$ with self-duality, offering advantages over explicit networks. The authors employ a layer-peeled framework to compare DEQ and explicit last-layer mappings, deriving lower bounds on training loss and NC-aligned structure, and validate these results with CIFAR-10/100 experiments in balanced and imbalanced regimes. The findings shed light on why DEQ can provide competitive or superior representation and robustness to class imbalance, while also outlining limitations and directions for extending the theory to broader DEQ architectures and more complex imbalance scenarios.

Abstract

Deep Equilibrium Model (DEQ), which serves as a typical implicit neural network, emphasizes their memory efficiency and competitive performance compared to explicit neural networks. However, there has been relatively limited theoretical analysis on the representation of DEQ. In this paper, we utilize the Neural Collapse ($\mathcal{NC}$) as a tool to systematically analyze the representation of DEQ under both balanced and imbalanced conditions. $\mathcal{NC}$ is an interesting phenomenon in the neural network training process that characterizes the geometry of class features and classifier weights. While extensively studied in traditional explicit neural networks, the $\mathcal{NC}$ phenomenon has not received substantial attention in the context of implicit neural networks. We theoretically show that $\mathcal{NC}$ exists in DEQ under balanced conditions. Moreover, in imbalanced settings, despite the presence of minority collapse, DEQ demonstrated advantages over explicit neural networks. These advantages include the convergence of extracted features to the vertices of a simplex equiangular tight frame and self-duality properties under mild conditions, highlighting DEQ's superiority in handling imbalanced datasets. Finally, we validate our theoretical analyses through experiments in both balanced and imbalanced scenarios.

Understanding Representation of Deep Equilibrium Models from Neural Collapse Perspective

TL;DR

This work analyzes Deep Equilibrium Models (DEQ) through Neural Collapse (NC) to understand their representation capacity. It proves NC occurs for DEQ in balanced data and shows that, under mild conditions in imbalanced settings, DEQ features converge toward the vertices of a with self-duality, offering advantages over explicit networks. The authors employ a layer-peeled framework to compare DEQ and explicit last-layer mappings, deriving lower bounds on training loss and NC-aligned structure, and validate these results with CIFAR-10/100 experiments in balanced and imbalanced regimes. The findings shed light on why DEQ can provide competitive or superior representation and robustness to class imbalance, while also outlining limitations and directions for extending the theory to broader DEQ architectures and more complex imbalance scenarios.

Abstract

Deep Equilibrium Model (DEQ), which serves as a typical implicit neural network, emphasizes their memory efficiency and competitive performance compared to explicit neural networks. However, there has been relatively limited theoretical analysis on the representation of DEQ. In this paper, we utilize the Neural Collapse () as a tool to systematically analyze the representation of DEQ under both balanced and imbalanced conditions. is an interesting phenomenon in the neural network training process that characterizes the geometry of class features and classifier weights. While extensively studied in traditional explicit neural networks, the phenomenon has not received substantial attention in the context of implicit neural networks. We theoretically show that exists in DEQ under balanced conditions. Moreover, in imbalanced settings, despite the presence of minority collapse, DEQ demonstrated advantages over explicit neural networks. These advantages include the convergence of extracted features to the vertices of a simplex equiangular tight frame and self-duality properties under mild conditions, highlighting DEQ's superiority in handling imbalanced datasets. Finally, we validate our theoretical analyses through experiments in both balanced and imbalanced scenarios.

Paper Structure

This paper contains 24 sections, 5 theorems, 76 equations, 6 figures, 5 tables.

Key Result

Theorem 3.1

(Feature collapse of explicit fully connected layers and implicit deep equilibrium models under balanced setting) Suppose (opt_exp) and (DEQ_NC) reaches its minimal, then $\mathcal{NC}1$: For $\forall~ k=1,2,\cdots,K$ and $\forall~ i=1,2,\cdots,n$: where $\boldsymbol h^0_{k}=\sum\limits_{i\in\pi(k)} \boldsymbol h^0_{k,i}$. Similarly, if the model is DEQ, then $\mathcal{NC}2$: The classifier alig

Figures (6)

  • Figure 1: Illustration of feature extraction. After extracting feature maps $\boldsymbol H^0$, further features $\boldsymbol H$ or $\boldsymbol z^\star$ can be obtained by passing through an explicit neural network or DEQ. The final step involves the classifier to obtain predicted logits. To ensure a fair comparison, we standardize the backbone network and its output $\boldsymbol{H}^0$ across all conditions.
  • Figure 2: Under the imbalanced setting for CIFAR-10 with $K_A=3$ and $R=10$, the disparity in the learned features between Explicit Neural Networks (left) and DEQ (right).
  • Figure 3: Comparison of accuracy and $\mathcal{NC}$ phenomenon in training Cifar-10 dataset
  • Figure 4: Accuracy and $\mathcal{NC}$ phenomenon on imbalanced dataset with $K_A=3$, $K_B=7$, $R=100$
  • Figure 5: Accuracy and $\mathcal{NC}$ phenomenon on imbalanced dataset with $K_A=7$, $K_B=3$, $R=100$
  • ...and 1 more figures

Theorems & Definitions (10)

  • Definition 2.1
  • Definition 2.2
  • Theorem 3.1
  • Theorem 4.1
  • Proposition 4.2
  • Lemma B.1
  • proof
  • Remark B.2
  • proof
  • Theorem B.3