Table of Contents
Fetching ...

Learning Equi-angular Representations for Online Continual Learning

Minhyuk Seo, Hyunseo Koh, Wonje Jeung, Minjae Lee, San Kim, Hankook Lee, Sungjun Cho, Sungik Choi, Hyunwoo Kim, Jonghyun Choi

TL;DR

The paper tackles the challenge of online continual learning where single-pass updates impede convergence to optimal representations. It leverages neural collapse by enforcing an equiangular tight frame ETF structure in the last-layer space and introduces two key mechanisms: preparatory data training to mitigate bias toward existing classes, and residual correction at inference to compensate for incomplete convergence. The proposed EARL method shows strong improvements across CIFAR-10/100, TinyImageNet, ImageNet-200, and ImageNet-1K in both disjoint and Gaussian scheduled setups, with notable gains in anytime inference performance. This approach offers a practical, scalable pathway to robust online CL by combining representation-level alignment with lightweight correction during deployment.

Abstract

Online continual learning suffers from an underfitted solution due to insufficient training for prompt model update (e.g., single-epoch training). To address the challenge, we propose an efficient online continual learning method using the neural collapse phenomenon. In particular, we induce neural collapse to form a simplex equiangular tight frame (ETF) structure in the representation space so that the continuously learned model with a single epoch can better fit to the streamed data by proposing preparatory data training and residual correction in the representation space. With an extensive set of empirical validations using CIFAR-10/100, TinyImageNet, ImageNet-200, and ImageNet-1K, we show that our proposed method outperforms state-of-the-art methods by a noticeable margin in various online continual learning scenarios such as disjoint and Gaussian scheduled continuous (i.e., boundary-free) data setups.

Learning Equi-angular Representations for Online Continual Learning

TL;DR

The paper tackles the challenge of online continual learning where single-pass updates impede convergence to optimal representations. It leverages neural collapse by enforcing an equiangular tight frame ETF structure in the last-layer space and introduces two key mechanisms: preparatory data training to mitigate bias toward existing classes, and residual correction at inference to compensate for incomplete convergence. The proposed EARL method shows strong improvements across CIFAR-10/100, TinyImageNet, ImageNet-200, and ImageNet-1K in both disjoint and Gaussian scheduled setups, with notable gains in anytime inference performance. This approach offers a practical, scalable pathway to robust online CL by combining representation-level alignment with lightweight correction during deployment.

Abstract

Online continual learning suffers from an underfitted solution due to insufficient training for prompt model update (e.g., single-epoch training). To address the challenge, we propose an efficient online continual learning method using the neural collapse phenomenon. In particular, we induce neural collapse to form a simplex equiangular tight frame (ETF) structure in the representation space so that the continuously learned model with a single epoch can better fit to the streamed data by proposing preparatory data training and residual correction in the representation space. With an extensive set of empirical validations using CIFAR-10/100, TinyImageNet, ImageNet-200, and ImageNet-1K, we show that our proposed method outperforms state-of-the-art methods by a noticeable margin in various online continual learning scenarios such as disjoint and Gaussian scheduled continuous (i.e., boundary-free) data setups.
Paper Structure (40 sections, 13 equations, 10 figures, 11 tables)

This paper contains 40 sections, 13 equations, 10 figures, 11 tables.

Figures (10)

  • Figure 1: Comparison of training error rates between online CL and offline CL in the CIFAR-10 disjoint setup, where two novel classes are added every 10k samples. Vanilla ETF refers to a method where both preparatory data training and residual correction are removed from our proposed EARL.
  • Figure 2: Overview of EARL. $\mathbf{w}_i$ denotes the ETF classifier vector for class $i$. $h_a$ denotes the output of the model. The colors of the data denote the class to which the data belong. The arrow $\mathbf{r}_i$ denotes the residual between the last layer activation $h_i$ and the classifier vector $\mathbf{w}_i$ for class $i$. During training, both memory and preparatory data are used for replaying, and the residuals between $h_i$ and $\mathbf{w}_i$ are stored in feature-residual memory. During inference, using the similarity between $f(x_\text{eval})$ and $h_i$ in feature-residual memory, $\textbf{r}_\text{eval}$ is obtained by a weighted sum of $\mathbf{r}_i$'s. Finally, by adding $\mathbf{r}_\text{eval}$, $f(x_\text{eval})$ is corrected. The purple arrow indicates 'residual correction' (Sec. \ref{['sec:residual']}).
  • Figure 3: Illustrative effects for each component of EARL. (a) In online CL, features of novel classes are biased towards the features of the previous class. (b) By training with preparatory data (w/ Prep. Data, Sec. \ref{['sec:preparatory']}), we address the bias problem. (c) In inference, for features that do not fully converge to an ETF classifier, we add residuals (w/ Res. Corr., Sec. \ref{['sec:residual']}) to features that have not yet reached the corresponding classifier vectors, making features aligned with them. Purple arrow: the 'residual correction', Colors: classes.
  • Figure 4: t-SNE visualization of 'bias-problem' in data distributions (class 0 to 3). (a) Only after 100 iterations of training after task 1 appears, learning is likely insufficient, and we can see that the features of new classes (class 2, 3) are biased towards the feature cluster of the existing class (i.e., class 1). (b) With more training iterations (10,000 iter), the features are well clustered by class.
  • Figure 5: Average similarity between features of the most recently added class's samples and the closest classifier vectors of the old classes (CIFAR-10, Gaussian Scheduled). Baseline is a vanilla ETF model trained only using episodic memory.
  • ...and 5 more figures