Table of Contents
Fetching ...

A Second-Order Perspective on Model Compositionality and Incremental Learning

Angelo Porrello, Lorenzo Bonicelli, Pietro Buzzega, Monica Millunzi, Simone Calderara, Rita Cucchiara

TL;DR

This work addresses how to achieve reliable compositionality among independently fine-tuned modules in non-linear deep networks. It introduces a second-order Taylor analysis of the loss around pre-training weights $\bm{\theta}_0$ and develops two incremental training strategies, Incremental Task Arithmetic (ITA) and Incremental Ensemble Learning (IEL), to realize modular composition. The authors derive a Jensen-type bound linking the composed model's risk to the risks of individual modules, and propose diagonal-Fisher-based regularization and a Fisher-based ensemble term to regularize training and preserve pre-training knowledge. Empirically, ITA and IEL achieve state-of-the-art or competitive final accuracy across diverse class-incremental benchmarks, while enabling specialization and unlearning with efficient inference, highlighting a practical pathway to composable, lifelong vision models.

Abstract

The fine-tuning of deep pre-trained models has revealed compositional properties, with multiple specialized modules that can be arbitrarily composed into a single, multi-task model. However, identifying the conditions that promote compositionality remains an open issue, with recent efforts concentrating mainly on linearized networks. We conduct a theoretical study that attempts to demystify compositionality in standard non-linear networks through the second-order Taylor approximation of the loss function. The proposed formulation highlights the importance of staying within the pre-training basin to achieve composable modules. Moreover, it provides the basis for two dual incremental training algorithms: the one from the perspective of multiple models trained individually, while the other aims to optimize the composed model as a whole. We probe their application in incremental classification tasks and highlight some valuable skills. In fact, the pool of incrementally learned modules not only supports the creation of an effective multi-task model but also enables unlearning and specialization in certain tasks. Code available at https://github.com/aimagelab/mammoth.

A Second-Order Perspective on Model Compositionality and Incremental Learning

TL;DR

This work addresses how to achieve reliable compositionality among independently fine-tuned modules in non-linear deep networks. It introduces a second-order Taylor analysis of the loss around pre-training weights and develops two incremental training strategies, Incremental Task Arithmetic (ITA) and Incremental Ensemble Learning (IEL), to realize modular composition. The authors derive a Jensen-type bound linking the composed model's risk to the risks of individual modules, and propose diagonal-Fisher-based regularization and a Fisher-based ensemble term to regularize training and preserve pre-training knowledge. Empirically, ITA and IEL achieve state-of-the-art or competitive final accuracy across diverse class-incremental benchmarks, while enabling specialization and unlearning with efficient inference, highlighting a practical pathway to composable, lifelong vision models.

Abstract

The fine-tuning of deep pre-trained models has revealed compositional properties, with multiple specialized modules that can be arbitrarily composed into a single, multi-task model. However, identifying the conditions that promote compositionality remains an open issue, with recent efforts concentrating mainly on linearized networks. We conduct a theoretical study that attempts to demystify compositionality in standard non-linear networks through the second-order Taylor approximation of the loss function. The proposed formulation highlights the importance of staying within the pre-training basin to achieve composable modules. Moreover, it provides the basis for two dual incremental training algorithms: the one from the perspective of multiple models trained individually, while the other aims to optimize the composed model as a whole. We probe their application in incremental classification tasks and highlight some valuable skills. In fact, the pool of incrementally learned modules not only supports the creation of an effective multi-task model but also enables unlearning and specialization in certain tasks. Code available at https://github.com/aimagelab/mammoth.
Paper Structure (33 sections, 1 theorem, 36 equations, 3 figures, 7 tables, 1 algorithm)

This paper contains 33 sections, 1 theorem, 36 equations, 3 figures, 7 tables, 1 algorithm.

Key Result

Theorem 1

Let us assume a pool $\mathcal{P}$ with $T \geq 2$ models, with the $t$-th model parameterized by $\bm{\theta}_t = \bm{\theta}_0 + \bm{\tau}_{t}$. If we compose them through coefficients $w_{1} , \dots, w_T$ s.t. $w_t \in [0, 1]$ and ${{ \sum}}_{t=1}^T w_t = 1$, the 2nd order approximation $\ell_{\o

Figures (3)

  • Figure 1: Effect of ITA. Best viewed in color.
  • Figure 2: Alignment -- i.e., cosine similarity -- between the task vectors produced by ITA and IEL for both the composed model $\bm{\theta}_\mathcal{P}$ and individual learners $\bm{\theta}_t$ (averaged across tasks $t$).
  • Figure 3: Comparative timing analysis (in minutes). The plot illustrates the per-task runtime of ITA and IEL, alongside baseline methods (DER++, TMC, and SEED). Runtimes include both the setup phase (e.g., steps required to compute FIM statistics) and the training phase.

Theorems & Definitions (3)

  • Theorem 1
  • proof
  • proof