Modular Embedding Recomposition for Incremental Learning

Aniello Panariello; Emanuele Frascaroli; Pietro Buzzega; Lorenzo Bonicelli; Angelo Porrello; Simone Calderara

Modular Embedding Recomposition for Incremental Learning

Aniello Panariello, Emanuele Frascaroli, Pietro Buzzega, Lorenzo Bonicelli, Angelo Porrello, Simone Calderara

TL;DR

MoDER addresses zero-shot continual learning with Vision-Language Models by building a modular library of textual experts stored in a foundational hub and composing them to form refined prototypes for unseen classes. It introduces Textual Alignment to train class-specific experts and Mixture of Textual Experts (MoTE) to forge new prototypes on the fly, enhanced by $\alpha$-smoothing and template augmentation for robustness. Across Class-IL and MTIL benchmarks (14 datasets), MoDER achieves state-of-the-art CI-Transfer and strong Final Average Accuracy, while using far fewer trainable parameters and enabling single-pass inference. The approach offers a scalable, online-friendly, privacy-conscious framework for modular knowledge reuse in VLMs.

Abstract

The advent of pre-trained Vision-Language Models (VLMs) has significantly transformed Continual Learning (CL), mainly due to their zero-shot classification abilities. Such proficiency makes VLMs well-suited for real-world applications, enabling robust performance on novel unseen classes without requiring adaptation. However, fine-tuning remains essential when downstream tasks deviate significantly from the pre-training domain. Prior CL approaches primarily focus on preserving the zero-shot capabilities of VLMs during incremental fine-tuning on a downstream task. We take a step further by devising an approach that transforms preservation into enhancement of the zero-shot capabilities of VLMs. Our approach, named MoDular Embedding Recomposition (MoDER), introduces a modular framework that trains multiple textual experts, each specialized in a single seen class, and stores them in a foundational hub. At inference time, for each unseen class, we query the hub and compose the retrieved experts to synthesize a refined prototype that improves classification. We show the effectiveness of our method across two popular zero-shot incremental protocols, Class-IL and MTIL, comprising a total of 14 datasets. The codebase is available at https://github.com/aimagelab/mammoth.

Modular Embedding Recomposition for Incremental Learning

TL;DR

Abstract

Modular Embedding Recomposition for Incremental Learning

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (3)