Calibrating Higher-Order Statistics for Few-Shot Class-Incremental Learning with Pre-trained Vision Transformers

Dipam Goswami; Bartłomiej Twardowski; Joost van de Weijer

Calibrating Higher-Order Statistics for Few-Shot Class-Incremental Learning with Pre-trained Vision Transformers

Dipam Goswami, Bartłomiej Twardowski, Joost van de Weijer

TL;DR

This paper tackles FSCIL with ViT backbones by addressing the poor estimation of higher-order statistics for few-shot classes. It introduces a statistics calibration framework that uses semantic similarity to weight base-class covariance and mean estimates, enabling calibrated covariances and means for new classes. When combined with FeCAM and RanPAC, the approach significantly improves harmonic mean accuracy across FSCIL benchmarks, demonstrating that leveraging base-class statistics can enhance few-shot generalization. The method is practical, requiring no extra training beyond the initial adaptor-based base-task adaptation and scales well across datasets.

Abstract

Few-shot class-incremental learning (FSCIL) aims to adapt the model to new classes from very few data (5 samples) without forgetting the previously learned classes. Recent works in many-shot CIL (MSCIL) (using all available training data) exploited pre-trained models to reduce forgetting and achieve better plasticity. In a similar fashion, we use ViT models pre-trained on large-scale datasets for few-shot settings, which face the critical issue of low plasticity. FSCIL methods start with a many-shot first task to learn a very good feature extractor and then move to the few-shot setting from the second task onwards. While the focus of most recent studies is on how to learn the many-shot first task so that the model generalizes to all future few-shot tasks, we explore in this work how to better model the few-shot data using pre-trained models, irrespective of how the first task is trained. Inspired by recent works in MSCIL, we explore how using higher-order feature statistics can influence the classification of few-shot classes. We identify the main challenge of obtaining a good covariance matrix from few-shot data and propose to calibrate the covariance matrix for new classes based on semantic similarity to the many-shot base classes. Using the calibrated feature statistics in combination with existing methods significantly improves few-shot continual classification on several FSCIL benchmarks. Code is available at https://github.com/dipamgoswami/FSCIL-Calibration.

Calibrating Higher-Order Statistics for Few-Shot Class-Incremental Learning with Pre-trained Vision Transformers

TL;DR

Abstract

Paper Structure (13 sections, 10 equations, 3 figures, 5 tables)

This paper contains 13 sections, 10 equations, 3 figures, 5 tables.

Introduction
Related Work
Method
Motivation
Statistics Calibration
Calibration with existing methods
FeCAM
RanPAC
Experiments
Quantitative Evaluation
Ablation Studies
Conclusion
Acknowledgement.

Figures (3)

Figure 1: Performance of different prototype-based classification methods on FSCIL settings with ViT-B/16 pre-trained on ImageNet-21k. All the methods - NCM, TEEN wang2023few, FeCAM goswami2023fecam and RanPAC mcdonnell2023ranpac are biased towards the base task classes. While TEEN improves the performance on the few-shot classes by prototype calibration, methods using second-order feature statistics - FeCAM and RanPAC performs much poorly on the few-shot classes compared to the many-shot base task classes. This drop in performance for new classes can be attributed to the poor estimates of second-order statistics from few-shot data. We propose to calibrate the covariance matrix of few-shot classes by using the strong covariance estimates of base classes. We observe that on using our proposed calibration, C-FeCAM and C-RanPAC improve performance significantly on the new classes, leading to an overall better accuracy.
Figure 2: Illustration to demonstrate how the similarity of the covariance matrices of classes vary based on the distance between the class prototypes. We train the model on 28 base classes on the Stanford Cars dataset, and plot the covariance similarity with respect to the prototype distance of a new class (from task 1) with the base classes. We observe that the classes with similar prototypes (lesser distance between the prototypes) have higher covariance similarities.
Figure 3: Accuracy after each incremental task for big-start settings on CUB-200, Stanford Cars and FGVC-Aircraft. Our proposed statistics calibration improves the average accuracy consistently after all tasks.

Calibrating Higher-Order Statistics for Few-Shot Class-Incremental Learning with Pre-trained Vision Transformers

TL;DR

Abstract

Calibrating Higher-Order Statistics for Few-Shot Class-Incremental Learning with Pre-trained Vision Transformers

Authors

TL;DR

Abstract

Table of Contents

Figures (3)