A Second-Order Perspective on Model Compositionality and Incremental Learning

Angelo Porrello; Lorenzo Bonicelli; Pietro Buzzega; Monica Millunzi; Simone Calderara; Rita Cucchiara

A Second-Order Perspective on Model Compositionality and Incremental Learning

Angelo Porrello, Lorenzo Bonicelli, Pietro Buzzega, Monica Millunzi, Simone Calderara, Rita Cucchiara

TL;DR

This work addresses how to achieve reliable compositionality among independently fine-tuned modules in non-linear deep networks. It introduces a second-order Taylor analysis of the loss around pre-training weights $\bm{\theta}_0$ and develops two incremental training strategies, Incremental Task Arithmetic (ITA) and Incremental Ensemble Learning (IEL), to realize modular composition. The authors derive a Jensen-type bound linking the composed model's risk to the risks of individual modules, and propose diagonal-Fisher-based regularization and a Fisher-based ensemble term to regularize training and preserve pre-training knowledge. Empirically, ITA and IEL achieve state-of-the-art or competitive final accuracy across diverse class-incremental benchmarks, while enabling specialization and unlearning with efficient inference, highlighting a practical pathway to composable, lifelong vision models.

Abstract

The fine-tuning of deep pre-trained models has revealed compositional properties, with multiple specialized modules that can be arbitrarily composed into a single, multi-task model. However, identifying the conditions that promote compositionality remains an open issue, with recent efforts concentrating mainly on linearized networks. We conduct a theoretical study that attempts to demystify compositionality in standard non-linear networks through the second-order Taylor approximation of the loss function. The proposed formulation highlights the importance of staying within the pre-training basin to achieve composable modules. Moreover, it provides the basis for two dual incremental training algorithms: the one from the perspective of multiple models trained individually, while the other aims to optimize the composed model as a whole. We probe their application in incremental classification tasks and highlight some valuable skills. In fact, the pool of incrementally learned modules not only supports the creation of an effective multi-task model but also enables unlearning and specialization in certain tasks. Code available at https://github.com/aimagelab/mammoth.

A Second-Order Perspective on Model Compositionality and Incremental Learning

TL;DR

and develops two incremental training strategies, Incremental Task Arithmetic (ITA) and Incremental Ensemble Learning (IEL), to realize modular composition. The authors derive a Jensen-type bound linking the composed model's risk to the risks of individual modules, and propose diagonal-Fisher-based regularization and a Fisher-based ensemble term to regularize training and preserve pre-training knowledge. Empirically, ITA and IEL achieve state-of-the-art or competitive final accuracy across diverse class-incremental benchmarks, while enabling specialization and unlearning with efficient inference, highlighting a practical pathway to composable, lifelong vision models.

Abstract

Paper Structure (33 sections, 1 theorem, 36 equations, 3 figures, 7 tables, 1 algorithm)

This paper contains 33 sections, 1 theorem, 36 equations, 3 figures, 7 tables, 1 algorithm.

Introduction
Framework
Individual learners vs. the composed model: a pre-training perspective
Enabling individual training in incremental scenarios
Joint training of the composed model in incremental scenarios
Algorithm(s)
Relation with existing works
Experiments
Discussion of limitations and future directions
Appendix / supplemental material
Proofs
Proof of Theorem \ref{['theorem:multiple']}
Proof of Eq. \ref{['eq:fisher_ensemble']}
Closed form gradients for Eq. \ref{['eq:augmentedjointv5']}
Computational analysis
...and 18 more sections

Key Result

Theorem 1

Let us assume a pool $\mathcal{P}$ with $T \geq 2$ models, with the $t$-th model parameterized by $\bm{\theta}_t = \bm{\theta}_0 + \bm{\tau}_{t}$. If we compose them through coefficients $w_{1} , \dots, w_T$ s.t. $w_t \in [0, 1]$ and ${{ \sum}}_{t=1}^T w_t = 1$, the 2nd order approximation $\ell_{\o

Figures (3)

Figure 1: Effect of ITA. Best viewed in color.
Figure 2: Alignment -- i.e., cosine similarity -- between the task vectors produced by ITA and IEL for both the composed model $\bm{\theta}_\mathcal{P}$ and individual learners $\bm{\theta}_t$ (averaged across tasks $t$).
Figure 3: Comparative timing analysis (in minutes). The plot illustrates the per-task runtime of ITA and IEL, alongside baseline methods (DER++, TMC, and SEED). Runtimes include both the setup phase (e.g., steps required to compute FIM statistics) and the training phase.

Theorems & Definitions (3)

Theorem 1
proof
proof

A Second-Order Perspective on Model Compositionality and Incremental Learning

TL;DR

Abstract

A Second-Order Perspective on Model Compositionality and Incremental Learning

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (3)

Theorems & Definitions (3)