OrCo: Towards Better Generalization via Orthogonality and Contrast for Few-Shot Class-Incremental Learning
Noor Ahmed, Anna Kukleva, Bernt Schiele
TL;DR
OrCo tackles few-shot class-incremental learning by enforcing feature-space orthogonality and leveraging contrastive learning to generalize across increments. The method proceeds in three phases: phase 1 pretraining with supervised and self-supervised contrastive losses, phase 2 aligning base data to fixed mutually orthogonal pseudo-targets via the OrCo loss, and phase 3 repeating a similar few-shot alignment for new classes, with perturbations to expand margins. Key contributions include the Perturbed Supervised Contrastive Loss (PSCL), an explicit orthogonality constraint, and a pseudo-target framework that reserves space for upcoming classes, achieving state-of-the-art results on mini-ImageNet, CIFAR100, and CUB200. The approach improves robustness to forgetting and overfitting, offering practical gains for real-world FSCIL tasks where incremental data is scarce.
Abstract
Few-Shot Class-Incremental Learning (FSCIL) introduces a paradigm in which the problem space expands with limited data. FSCIL methods inherently face the challenge of catastrophic forgetting as data arrives incrementally, making models susceptible to overwriting previously acquired knowledge. Moreover, given the scarcity of labeled samples available at any given time, models may be prone to overfitting and find it challenging to strike a balance between extensive pretraining and the limited incremental data. To address these challenges, we propose the OrCo framework built on two core principles: features' orthogonality in the representation space, and contrastive learning. In particular, we improve the generalization of the embedding space by employing a combination of supervised and self-supervised contrastive losses during the pretraining phase. Additionally, we introduce OrCo loss to address challenges arising from data limitations during incremental sessions. Through feature space perturbations and orthogonality between classes, the OrCo loss maximizes margins and reserves space for the following incremental data. This, in turn, ensures the accommodation of incoming classes in the feature space without compromising previously acquired knowledge. Our experimental results showcase state-of-the-art performance across three benchmark datasets, including mini-ImageNet, CIFAR100, and CUB datasets. Code is available at https://github.com/noorahmedds/OrCo
