OrCo: Towards Better Generalization via Orthogonality and Contrast for Few-Shot Class-Incremental Learning

Noor Ahmed; Anna Kukleva; Bernt Schiele

OrCo: Towards Better Generalization via Orthogonality and Contrast for Few-Shot Class-Incremental Learning

Noor Ahmed, Anna Kukleva, Bernt Schiele

TL;DR

OrCo tackles few-shot class-incremental learning by enforcing feature-space orthogonality and leveraging contrastive learning to generalize across increments. The method proceeds in three phases: phase 1 pretraining with supervised and self-supervised contrastive losses, phase 2 aligning base data to fixed mutually orthogonal pseudo-targets via the OrCo loss, and phase 3 repeating a similar few-shot alignment for new classes, with perturbations to expand margins. Key contributions include the Perturbed Supervised Contrastive Loss (PSCL), an explicit orthogonality constraint, and a pseudo-target framework that reserves space for upcoming classes, achieving state-of-the-art results on mini-ImageNet, CIFAR100, and CUB200. The approach improves robustness to forgetting and overfitting, offering practical gains for real-world FSCIL tasks where incremental data is scarce.

Abstract

Few-Shot Class-Incremental Learning (FSCIL) introduces a paradigm in which the problem space expands with limited data. FSCIL methods inherently face the challenge of catastrophic forgetting as data arrives incrementally, making models susceptible to overwriting previously acquired knowledge. Moreover, given the scarcity of labeled samples available at any given time, models may be prone to overfitting and find it challenging to strike a balance between extensive pretraining and the limited incremental data. To address these challenges, we propose the OrCo framework built on two core principles: features' orthogonality in the representation space, and contrastive learning. In particular, we improve the generalization of the embedding space by employing a combination of supervised and self-supervised contrastive losses during the pretraining phase. Additionally, we introduce OrCo loss to address challenges arising from data limitations during incremental sessions. Through feature space perturbations and orthogonality between classes, the OrCo loss maximizes margins and reserves space for the following incremental data. This, in turn, ensures the accommodation of incoming classes in the feature space without compromising previously acquired knowledge. Our experimental results showcase state-of-the-art performance across three benchmark datasets, including mini-ImageNet, CIFAR100, and CUB datasets. Code is available at https://github.com/noorahmedds/OrCo

OrCo: Towards Better Generalization via Orthogonality and Contrast for Few-Shot Class-Incremental Learning

TL;DR

Abstract

Paper Structure (18 sections, 16 equations, 7 figures, 13 tables)

This paper contains 18 sections, 16 equations, 7 figures, 13 tables.

Introduction
Related Work
OrCo Framework
Preliminaries
OrCo Framework
OrCo Loss
Experimental Results
Datasets and Evaluation
Comparison to state-of-the-art
Analysis
Conclusion
More Ablations
More Results
Theory of Orthogonality
Contrastive losses
...and 3 more sections

Figures (7)

Figure 1: PCA analysis on feature space before and after alignment.Left: Before aligning incremental classes to orthogonal pseudo-targets. Right: After aligning incremental classes to assigned targets using OrCo loss. Our loss effectively reduces misalignment. Additionally, it enhances generalization for incoming classes by explicitly reserving space.
Figure 2: Overview of OrCo framework. Our OrCo framework is a three-phase approach for FSCIL. Phase 1 (Pretrain): We pretrain both backbone and projection head with SCL and SSCL on base dataset $D^0$. Before the next phase, we generate mutually orthogonal pseudo-targets. Phase 2 (Base Alignment): We aim to align the base dataset $D^0$ to the pseudo-targets through our OrCo loss. This involves pulling class features towards the nearest pseudo-targets and pushing forces based on perturbations around unassigned pseudo-targets (grey stars without assigned colored class means) to increase the margin and preserve space for incoming classes. Phase 3 (Few-Shot Alignment): Phase 3, employed in each subsequent incremental session, is similar to Phase 2 and assigns pseudo-targets to incremental class means with further alignment using our OrCo loss.
Figure 3: OrCo loss consists of three components: our proposed perturbed supervised contrastive loss (PSCL), cross-entropy loss (CE), and orthogonality loss (ORTH). $z_j$ denotes the real data anchor point for a contrastive loss, $\bar{z}_j$ denotes the unassigned pseudo-target anchor, and $t_j$ denotes an additional positive sample for yellow class in the form of an assigned pseudo-target. $\mu_j$ represents the within-batch mean features.
Figure 4: Sota comparisons on CIFAR100 and CUB200 datasets. Performance curves, that measure harmonic mean, of our method comparing to recent sota methods. Left: CIFAR100. Right: CUB200. $\Delta$aHM denotes the average harmonic mean improvement over the runner-up method.
Figure 5: Measurement of angle during orthogonality optimization. The green curve corresponds to the evolution of the average angle between all pairs during the optimization. The gray curve shows measurements of random pairs at each epoch.
...and 2 more figures

OrCo: Towards Better Generalization via Orthogonality and Contrast for Few-Shot Class-Incremental Learning

TL;DR

Abstract

OrCo: Towards Better Generalization via Orthogonality and Contrast for Few-Shot Class-Incremental Learning

Authors

TL;DR

Abstract

Table of Contents

Figures (7)