Dataless Weight Disentanglement in Task Arithmetic via Kronecker-Factored Approximate Curvature

Angelo Porrello; Pietro Buzzega; Felix Dangel; Thomas Sommariva; Riccardo Salami; Lorenzo Bonicelli; Simone Calderara

Dataless Weight Disentanglement in Task Arithmetic via Kronecker-Factored Approximate Curvature

Angelo Porrello, Pietro Buzzega, Felix Dangel, Thomas Sommariva, Riccardo Salami, Lorenzo Bonicelli, Simone Calderara

TL;DR

Task Arithmetic enables modular model edits but suffers from cross-task interference when combining task vectors. The authors recast representation drift as a curvature-based penalty and implement a dataless regularizer (TAK) based on Kronecker-Factored Approximate Curvature to approximate the Generalized Gauss-Newton. They introduce a merging strategy that aggregates per-task curvature factors, achieving constant complexity in the number of tasks and robustness to task-scale. Empirical results on vision and language tasks demonstrate state-of-the-art performance on task addition and negation, with strong data privacy properties and efficient training. The work advances practical, privacy-preserving composition of foundation models.

Abstract

Task Arithmetic yields a modular, scalable way to adapt foundation models. Combining multiple task vectors, however, can lead to cross-task interference, causing representation drift and degraded performance. Representation drift regularization provides a natural remedy to disentangle task vectors; however, existing approaches typically require external task data, conflicting with modularity and data availability constraints (e.g., privacy requirements). We propose a dataless approach by framing regularization against representation drift as a curvature matrix approximation problem. This allows us to leverage well-established techniques; in particular, we adopt Kronecker-Factored Approximate Curvature and obtain a practical regularizer that achieves state-of-the-art results in task addition and negation. Our method has constant complexity in the number of tasks and promotes robustness to task vector rescaling, eliminating the need for held-out tuning.

Dataless Weight Disentanglement in Task Arithmetic via Kronecker-Factored Approximate Curvature

TL;DR

Abstract

Paper Structure (28 sections, 19 equations, 15 figures, 8 tables, 2 algorithms)

This paper contains 28 sections, 19 equations, 15 figures, 8 tables, 2 algorithms.

Introduction
Background: Task Arithmetic and Linearized Fine-Tuning
Making Representation Drift Regularization Data-Free
Connecting Representation Drift Regularization to Curvature Matrices
The Generalized Gauss-Newton (GGN) Matrix
Kronecker-Factored Approximation of the Generalized Gauss-Newton
Multi-task Training Procedure & Regularization Merging
Experiments
Conclusions
Appendix / Supplementary Material
Limitations
Approximation Error of the Merged KFAC Factors
Error bound.
Interpretation.
Additional plots on Weight Disentanglement
...and 13 more sections

Figures (15)

Figure 1: Weight disentanglement (left) without and (right) with Jacobian Gram regularization.
Figure 2: Impact of regularization on "8 Vision" — CLIP ViT-B/16 (abs. accuracy). Left: linearized fine-tuning regime. Right: non-linear regime. See the Appendix for CLIP ViT-B/32 and -L/14.
Figure 3: Results for language tasks. Left: impact of different training strategies and sensitivity to $\alpha$ hyperparameter. Right: effects of different regularizations on linear and non-linear fine-tuning.
Figure 4: For ViT-B/32 (8 Vision), we analyze the sensitivity of different merging strategies to the scaling coefficient $\alpha$; a similar analysis for ViT-B/16 is reported in the Appendix. Left: $\alpha$-sweep accuracy of post-hoc merging strategies in the non-linear regime, compared with our linearized and regularized models. Right: performance of merging methods on linearized checkpoints.
Figure 5: Distribution of $\left\lVert \mathrm{J}_{{\bm{\theta}}} f({\bm{x}}, {\bm{\theta}}_0)\bm{\tau}_{t} \right\rVert_2^2$ for inputs originating from the training distribution of task $t$ (inliers) versus from other tasks (outliers), under both regularized and non-regularized FT.
...and 10 more figures

Dataless Weight Disentanglement in Task Arithmetic via Kronecker-Factored Approximate Curvature

TL;DR

Abstract

Dataless Weight Disentanglement in Task Arithmetic via Kronecker-Factored Approximate Curvature

Authors

TL;DR

Abstract

Table of Contents

Figures (15)