Table of Contents
Fetching ...

Fresh-CL: Feature Realignment through Experts on Hypersphere in Continual Learning

Zhongyi Zhou, Yaxin Peng, Pin Yi, Minjie Zhu, Chaomin Shen

TL;DR

The paper tackles feature entanglement and forgetting in continual learning by enforcing discriminative separation through fixed simplex ETF targets on a hypersphere and a dynamic mixture-of-experts to realize task-specific projections. It introduces a Dot-Regression Loss to align normalized features with ETF pseudo targets and employs an incremental MoE with per-task routing and freezing to preserve domain-specific representations. Empirical evaluation across 11 datasets under multi-task incremental learning shows consistent gains over strong baselines in both full-shot and few-shot settings, with a compact model footprint due to sparsity. The approach demonstrates that combining ETF-inspired targets with adaptive subspace projections effectively maintains discriminative feature representations across evolving task domains.

Abstract

Continual Learning enables models to learn and adapt to new tasks while retaining prior knowledge. Introducing new tasks, however, can naturally lead to feature entanglement across tasks, limiting the model's capability to distinguish between new domain data. In this work, we propose a method called Feature Realignment through Experts on hyperSpHere in Continual Learning (Fresh-CL). By leveraging predefined and fixed simplex equiangular tight frame (ETF) classifiers on a hypersphere, our model improves feature separation both intra and inter tasks. However, the projection to a simplex ETF shifts with new tasks, disrupting structured feature representation of previous tasks and degrading performance. Therefore, we propose a dynamic extension of ETF through mixture of experts, enabling adaptive projections onto diverse subspaces to enhance feature representation. Experiments on 11 datasets demonstrate a 2% improvement in accuracy compared to the strongest baseline, particularly in fine-grained datasets, confirming the efficacy of combining ETF and MoE to improve feature distinction in continual learning scenarios.

Fresh-CL: Feature Realignment through Experts on Hypersphere in Continual Learning

TL;DR

The paper tackles feature entanglement and forgetting in continual learning by enforcing discriminative separation through fixed simplex ETF targets on a hypersphere and a dynamic mixture-of-experts to realize task-specific projections. It introduces a Dot-Regression Loss to align normalized features with ETF pseudo targets and employs an incremental MoE with per-task routing and freezing to preserve domain-specific representations. Empirical evaluation across 11 datasets under multi-task incremental learning shows consistent gains over strong baselines in both full-shot and few-shot settings, with a compact model footprint due to sparsity. The approach demonstrates that combining ETF-inspired targets with adaptive subspace projections effectively maintains discriminative feature representations across evolving task domains.

Abstract

Continual Learning enables models to learn and adapt to new tasks while retaining prior knowledge. Introducing new tasks, however, can naturally lead to feature entanglement across tasks, limiting the model's capability to distinguish between new domain data. In this work, we propose a method called Feature Realignment through Experts on hyperSpHere in Continual Learning (Fresh-CL). By leveraging predefined and fixed simplex equiangular tight frame (ETF) classifiers on a hypersphere, our model improves feature separation both intra and inter tasks. However, the projection to a simplex ETF shifts with new tasks, disrupting structured feature representation of previous tasks and degrading performance. Therefore, we propose a dynamic extension of ETF through mixture of experts, enabling adaptive projections onto diverse subspaces to enhance feature representation. Experiments on 11 datasets demonstrate a 2% improvement in accuracy compared to the strongest baseline, particularly in fine-grained datasets, confirming the efficacy of combining ETF and MoE to improve feature distinction in continual learning scenarios.
Paper Structure (12 sections, 4 equations, 2 figures, 3 tables)

This paper contains 12 sections, 4 equations, 2 figures, 3 tables.

Figures (2)

  • Figure 1: t-SNE visualization of Aircraft dataset features. (a) Features extracted by MOA tend to blend together, making them recognizable as a single "aircraft" class. (b) Our method improves feature separation, enabling distinct class recognition.
  • Figure 2: Overall framework of Fresh-CL. When training Task $\bm t$, each input image $\boldsymbol x_{k}$ is fed into a frozen backbone, and then processed through a dynamic routing mechanism, which select the top-$k$ (e.g. top-2) appropriate experts $i$ and $j$ using gating weights $G_i^t$ and $G_j^t$. These experts project the normalized features $\hat{\boldsymbol{\mu}}_{k}^{i}$ and $\hat{\boldsymbol{\mu}}_{k}^{j}$ into distinct hyperspheres, aligning each of them with their corresponding predefined "pseudo targets" $\bm w_{a}^i$ and $\bm w_{a}^j$ on the relevant hypersphere. The subscripts $a, b, c$ represent arbitrary labels from the label space $C^t$. These pseudo targets are predefined as equiangular vectors $\hat{\bm{W}}_{ETF}^{i}$ and $\hat{\bm{W}}_{ETF}^{j}$ for each class of all tasks. By using the DR loss, we force these features close to their corresponding targets, therefore achieve max separation on each appropriate hypersphere selected by the MoE module.