LCA: Local Classifier Alignment for Continual Learning

Tung Tran; Danilo Vasconcellos Vargas; Khoat Than

LCA: Local Classifier Alignment for Continual Learning

Tung Tran, Danilo Vasconcellos Vargas, Khoat Than

TL;DR

A complete solution for continual learning is developed, following the model merging approach and using LCA, which can enable the classifier to not only generalize well for all observed tasks, but also improve robustness.

Abstract

A fundamental requirement for intelligent systems is the ability to learn continuously under changing environments. However, models trained in this regime often suffer from catastrophic forgetting. Leveraging pre-trained models has recently emerged as a promising solution, since their generalized feature extractors enable faster and more robust adaptation. While some earlier works mitigate forgetting by fine-tuning only on the first task, this approach quickly deteriorates as the number of tasks grows and the data distributions diverge. More recent research instead seeks to consolidate task knowledge into a unified backbone, or adapting the backbone as new tasks arrive. However, such approaches may create a (potential) \textit{mismatch} between task-specific classifiers and the adapted backbone. To address this issue, we propose a novel \textit{Local Classifier Alignment} (LCA) loss to better align the classifier with backbone. Theoretically, we show that this LCA loss can enable the classifier to not only generalize well for all observed tasks, but also improve robustness. Furthermore, we develop a complete solution for continual learning, following the model merging approach and using LCA. Extensive experiments on several standard benchmarks demonstrate that our method often achieves leading performance, sometimes surpasses the state-of-the-art methods with a large margin.

LCA: Local Classifier Alignment for Continual Learning

TL;DR

Abstract

Paper Structure (25 sections, 3 theorems, 18 equations, 13 figures, 3 tables, 1 algorithm)

This paper contains 25 sections, 3 theorems, 18 equations, 13 figures, 3 tables, 1 algorithm.

Introduction
Related Works
Methodology
Problem Formulation
Incremental Knowledge Consolidation
Local Classifier Alignment
Theoretical analysis
Experiments
Experiment Setup
Experiment Results
Overall Benchmark
Robustness Measurement
Ablation Study
Conclusion
Proof for main Theorems
...and 10 more sections

Key Result

Theorem 3.1

Consider a model $h_t$ learned from a dataset ${\bm{D}} = \{{\bm{D}}_1, ..., {\bm{D}}_t\}$, where ${\bm{D}}_i$ contains $n_i$ i.i.d. samples from distribution ${\mathcal{N}}_i$ for each $i \le C_t$, and a bounded loss $\ell$. Denote $P = \frac{1}{C_t} \sum_{i=1}^{C_t} {\mathcal{N}}_i$ as the overall

Figures (13)

Figure 1: A comparison between IM and IM+LCA. IM is the result after only done the Incremental Merging step, while IM+LCA has Local Classifier Alignment as the last step.
Figure 2: Performance curves of different methods across all tasks and datasets. All methods use ViT-B/16-IN1K as the pre-trained backbone without any additional exemplars.
Figure 3: Effect of $\lambda$ on the accuracy on two datasets.
Figure 4: (a) Complementary evaluation of LCA when using LCA for MOS and SLCA. (b) Robustness performance of IM and IM+LCA on corruption and perturbation benchmarks.
Figure 5: Accuracy performance of IM and IM+LCA under different corruption and perturbation types. The relative difference between IM and IM+LCA is highlighted.
...and 8 more figures

Theorems & Definitions (6)

Theorem 3.1
Corollary 1
Remark 1
Theorem 3.2
proof : Proof of Theorem \ref{['thm-LCA-generalization']}
proof : Proof of Theorem \ref{['thm-LCA-generalization-change']}

LCA: Local Classifier Alignment for Continual Learning

TL;DR

Abstract

LCA: Local Classifier Alignment for Continual Learning

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (13)

Theorems & Definitions (6)