Learning in the Null Space: Small Singular Values for Continual Learning

Cuong Anh Pham; Praneeth Vepakomma; Samuel Horváth

Learning in the Null Space: Small Singular Values for Continual Learning

Cuong Anh Pham, Praneeth Vepakomma, Samuel Horváth

TL;DR

This paper introduces NESS (Null-space Estimated from Small Singular values), a CL method that applies orthogonality directly in the weight space rather than through gradient manipulation and demonstrates competitive performance, low forgetting, and stable accuracy across tasks, highlighting the role of small singular values in continual learning.

Abstract

Alleviating catastrophic forgetting while enabling further learning is a primary challenge in continual learning (CL). Orthogonal-based training methods have gained attention for their efficiency and strong theoretical properties, and many existing approaches enforce orthogonality through gradient projection. In this paper, we revisit orthogonality and exploit the fact that small singular values correspond to directions that are nearly orthogonal to the input space of previous tasks. Building on this principle, we introduce NESS (Null-space Estimated from Small Singular values), a CL method that applies orthogonality directly in the weight space rather than through gradient manipulation. Specifically, NESS constructs an approximate null space using the smallest singular values of each layer's input representation and parameterizes task-specific updates via a compact low-rank adaptation (LoRA-style) formulation constrained to this subspace. The subspace basis is fixed to preserve the null-space constraint, and only a single trainable matrix is learned for each task. This design ensures that the resulting updates remain approximately in the null space of previous inputs while enabling adaptation to new tasks. Our theoretical analysis and experiments on three benchmark datasets demonstrate competitive performance, low forgetting, and stable accuracy across tasks, highlighting the role of small singular values in continual learning. The code is available at https://github.com/pacman-ctm/NESS.

Learning in the Null Space: Small Singular Values for Continual Learning

TL;DR

Abstract

Paper Structure (20 sections, 15 equations, 5 figures, 10 tables, 1 algorithm)

This paper contains 20 sections, 15 equations, 5 figures, 10 tables, 1 algorithm.

Introduction
Related Work
Methodology
Problem Definition
Our Proposed Method
Stability Constraint (Output Preservation).
Plasticity Objective (New Task Learning).
Construction of the Stability Subspace.
Selecting the Small-Singular-Value Subspace.
Explicit Stability Bound.
Practical Enforcement.
Training Paradigm
Experiments
Experimental Setup
Experimental Results
...and 5 more sections

Figures (5)

Figure 1: Overview of NESS. During continual training (left), each task updates the network sequentially. For every task $t$ and layer $l$, we collect the concatenated inputs from previous tasks and compute the SVD of the corresponding covariance matrix (right). The update is parameterized as a low-rank decomposition $\Delta W_t^{(l)} = U_t^{(l)} V_t^{(l)}$, where $U_t^{(l)}$ is a frozen orthogonal basis constructed from singular vectors associated with small singular values, and $V_t^{(l)}$ is a trainable matrix initialized to zero. This structured update constrains learning to an approximate null subspace of previous inputs, limiting interference while enabling adaptation to the current task.
Figure 2: Performance (best setting of each method, focusing on forgetting rate, with green color denotes improving performance and red color denotes losing performance, compared to the blue color cell for the performance when the task first trained) of (a) NESS compared to (b) SGP; (c) TRGP; (d) DFGP on miniImageNet experiment (seed 3).
Figure 3: Performance (best setting of each method, focusing on forgetting rate, with green color denotes improving performance and red color denotes losing performance, compared to the time the task firstly trained) of (a) NESS compared to (b) SGP; (c) TRGP; (d) DFGP on CIFAR-100 experiment (seed 2).
Figure 4: Performance (best setting of each method, focusing on forgetting rate, with green color denotes improving performance and red color denotes losing performance, compared to the time the task firstly trained) of (a) NESS compared to (b) SGP; (c) TRGP; (d) DFGP on CIFAR-100 experiment (seed 3).
Figure 5: Performance (best setting of each method, focusing on forgetting rate, with green color denotes improving performance and red color denotes losing performance, compared to the time the task firstly trained) of (a) NESS compared to (b) SGP; (c) TRGP; (d) DFGP on miniImageNet experiment (seed 37).

Learning in the Null Space: Small Singular Values for Continual Learning

TL;DR

Abstract

Learning in the Null Space: Small Singular Values for Continual Learning

Authors

TL;DR

Abstract

Table of Contents

Figures (5)