NTK-Guided Few-Shot Class Incremental Learning

Jingren Liu; Zhong Ji; Yanwei Pang; YunLong Yu

NTK-Guided Few-Shot Class Incremental Learning

Jingren Liu, Zhong Ji, Yanwei Pang, YunLong Yu

TL;DR

This work introduces NTK-FSCIL, a theoretical and empirical framework that treats anti-amnesia in FSCIL through Neural Tangent Kernel (NTK) dynamics. It jointly targets NTK convergence via a meta-learning strategy and NTK-related generalization via self-supervised pre-training, curricular logit-label alignment, and dual NTK regularization on convolutional and linear layers. The approach yields state-of-the-art results across CIFAR100, miniImageNet, CUB200-2011, and ImageNet100, with substantial improvements in end-session accuracy and reduced forgetting, while also providing insights into how network width and NTK spectrum shape generalization in continual learning. By combining meta-learning, SSL initialization, and NTK-aware regularization, NTK-FSCIL offers a principled path to scalable, robust FSCIL with strong theoretical backing and practical impact for continual, few-shot classification tasks.

Abstract

The proliferation of Few-Shot Class Incremental Learning (FSCIL) methodologies has highlighted the critical challenge of maintaining robust anti-amnesia capabilities in FSCIL learners. In this paper, we present a novel conceptualization of anti-amnesia in terms of mathematical generalization, leveraging the Neural Tangent Kernel (NTK) perspective. Our method focuses on two key aspects: ensuring optimal NTK convergence and minimizing NTK-related generalization loss, which serve as the theoretical foundation for cross-task generalization. To achieve global NTK convergence, we introduce a principled meta-learning mechanism that guides optimization within an expanded network architecture. Concurrently, to reduce the NTK-related generalization loss, we systematically optimize its constituent factors. Specifically, we initiate self-supervised pre-training on the base session to enhance NTK-related generalization potential. These self-supervised weights are then carefully refined through curricular alignment, followed by the application of dual NTK regularization tailored specifically for both convolutional and linear layers. Through the combined effects of these measures, our network acquires robust NTK properties, ensuring optimal convergence and stability of the NTK matrix and minimizing the NTK-related generalization loss, significantly enhancing its theoretical generalization. On popular FSCIL benchmark datasets, our NTK-FSCIL surpasses contemporary state-of-the-art approaches, elevating end-session accuracy by 2.9\% to 9.3\%.

NTK-Guided Few-Shot Class Incremental Learning

TL;DR

Abstract

Paper Structure (24 sections, 3 theorems, 19 equations, 8 figures, 14 tables)

This paper contains 24 sections, 3 theorems, 19 equations, 8 figures, 14 tables.

Introduction
Related Works
NTK Theoretical Foundation in FSCIL
FSCIL Problem Formulation
NTK Dynamics in FSCIL
Generalization and NTK Analysis
Meta-Learning Convergence in FSCIL
Reducing Generalization Loss in FSCIL
Initialize Weights: Self-Supervised Pre-Training
Better Alignment: Margin-based Loss
Refine NTK eigenvalues: Dual NTK Regularization
Experiments
Experimental Setup
Impacts of Network Width in FSCIL
Exploring Self-supervised Pre-training
...and 9 more sections

Key Result

Theorem 1

Assuming negligible learning rates $\eta$ and $\lambda$, and given a network width $l$ approaching infinity, the meta-outputs $F_t$ for inputs $\mathbf{X}_j \in \mathbf{X}_j^t$, in relation to the training pairs $(\mathbf{X}_i, \mathbf{Y}_i) \in (\mathbf{X}_i^t, \mathbf{Y}_i^t)$, are highly likely t

Figures (8)

Figure 1: The difference in FSCIL performance amongst various self-supervised learners, utilizing ResNet-18$\times$2 on CIFAR100.
Figure 2: The FSCIL performance on CIFAR100 across different widths in ResNet-18, employing the CEC zhang2021few and ALICE peng2022few.
Figure 3: This figure elucidates the FSCIL results of four different ConvNets (a.k.a. ResNet-12, ResNet-20, ResNet-34 and ConvNext), grounded on CEC and ALICE, following different width expansions on CIFAR100.
Figure 4: Top-1 accuracy for base and incremental sessions, assessed under diverse parameters on CIFAR100.
Figure 5: Top-1 accuracy for base and incremental sessions, assessed under diverse parameters on CUB200.
...and 3 more figures

Theorems & Definitions (4)

Definition 1: Neural Tangent Kernel
Theorem 1: NTK-related Meta-Learning Output
Theorem 2: NTK-related Meta-Learning Convergence
Theorem 3: NTK-related Generalization

NTK-Guided Few-Shot Class Incremental Learning

TL;DR

Abstract

NTK-Guided Few-Shot Class Incremental Learning

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (8)

Theorems & Definitions (4)