Table of Contents
Fetching ...

Task Vector Bases: A Unified and Scalable Framework for Compressed Task Arithmetic

Siqi Zeng, Yifei He, Meitong Liu, Weiqiu You, Yifan Hao, Yao-Hung Hubert Tsai, Makoto Yamada, Han Zhao

TL;DR

This work tackles the scalability of task-vector methods by introducing Task Vector Bases, a framework that compresses $T$ task vectors into $M$ basis vectors while preserving addition, negation, and advanced arithmetic. It develops a Softmax-Activated Linear Autoencoder to learn bases as convex combinations of the original task vectors, yielding interpretable, nonnegative basis components and enabling efficient offline and online merging. The authors provide theoretical guarantees on generalization for additions and unlearning/negation, and demonstrate empirically that bases can match or exceed full-vector performance with substantial reductions in storage and computation across vision and language benchmarks, including offline multitask learning, few-shot OOD generalization, and online continual learning. The results show strong practical impact: bases reduce memory and compute bottlenecks, improve robustness to task interference, and remain compatible with existing task-arithmetic methods and compression techniques.

Abstract

Task arithmetic, representing downstream tasks through linear operations on task vectors, has emerged as a simple yet powerful paradigm for transferring knowledge across diverse settings. However, maintaining a large collection of task vectors introduces scalability challenges in both storage and computation. We propose Task Vector Bases, a framework compressing $T$ task vectors into $M < T$ basis vectors while preserving the functionality of task arithmetic. By representing each task vector as a structured linear combination of basis atoms, our approach supports standard operations such as addition, negation, as well as more advanced arithmetic ones. The framework is orthogonal to other efficiency-oriented improvements in task arithmetic and can be used in combination with them. We provide theoretical analysis showing that basis compression retains addition generalization guarantees and enables principled unlearning, with error bounds depending on reconstruction quality. Empirically, our proposed basis construction methods consistently outperform heuristic basis construction baselines and, in some cases, even surpass the performance of full task vector collections across diverse downstream applications while reducing storage and computational requirements. The code is available at https://github.com/uiuctml/TaskVectorBasis.

Task Vector Bases: A Unified and Scalable Framework for Compressed Task Arithmetic

TL;DR

This work tackles the scalability of task-vector methods by introducing Task Vector Bases, a framework that compresses task vectors into basis vectors while preserving addition, negation, and advanced arithmetic. It develops a Softmax-Activated Linear Autoencoder to learn bases as convex combinations of the original task vectors, yielding interpretable, nonnegative basis components and enabling efficient offline and online merging. The authors provide theoretical guarantees on generalization for additions and unlearning/negation, and demonstrate empirically that bases can match or exceed full-vector performance with substantial reductions in storage and computation across vision and language benchmarks, including offline multitask learning, few-shot OOD generalization, and online continual learning. The results show strong practical impact: bases reduce memory and compute bottlenecks, improve robustness to task interference, and remain compatible with existing task-arithmetic methods and compression techniques.

Abstract

Task arithmetic, representing downstream tasks through linear operations on task vectors, has emerged as a simple yet powerful paradigm for transferring knowledge across diverse settings. However, maintaining a large collection of task vectors introduces scalability challenges in both storage and computation. We propose Task Vector Bases, a framework compressing task vectors into basis vectors while preserving the functionality of task arithmetic. By representing each task vector as a structured linear combination of basis atoms, our approach supports standard operations such as addition, negation, as well as more advanced arithmetic ones. The framework is orthogonal to other efficiency-oriented improvements in task arithmetic and can be used in combination with them. We provide theoretical analysis showing that basis compression retains addition generalization guarantees and enables principled unlearning, with error bounds depending on reconstruction quality. Empirically, our proposed basis construction methods consistently outperform heuristic basis construction baselines and, in some cases, even surpass the performance of full task vector collections across diverse downstream applications while reducing storage and computational requirements. The code is available at https://github.com/uiuctml/TaskVectorBasis.

Paper Structure

This paper contains 61 sections, 14 theorems, 47 equations, 13 figures, 16 tables, 1 algorithm.

Key Result

Lemma 3.2

With gram matrix $\mathbf{G}:=\mathbf{T}^\top \mathbf{T}$ and $\mathbf{E}=\mathbf{W}_e\mathbf{W}_d-\mathbf{I}_T$ as above, eq:ae-loss is equivalent to

Figures (13)

  • Figure 1: Limitations of PCA for task addition: (a) performance view and (b) geometric view.
  • Figure 2: (a)–(b) Radar plots showing per-task accuracy across vision ($100\%$ = TA) and language ($100\%$ = L&S) benchmarks. (c) Absolute accuracy against merging time for different $M$, with circle size indicating disk storage cost in gigabytes (same scale across top and bottom).
  • Figure 3: ViT-B/32 results with OOD 6 tasks at $M=50\%$ of in domain 8 tasks.
  • Figure 4: Target task forgetting as a function of $M$.
  • Figure 5: (a) Task vector similarity vs. $\mathcal{L}_\text{MNIST}(\theta_\mathrm{Add}^2) - \mathcal{L}_\text{MNIST}(\theta_\text{MNIST})$, where $\theta_\mathrm{Add}^2 = \theta_0 + 0.5\tau_\text{MNIST} + 0.5\tau_\text{task}$. This figure includes two different set of CLIP ViT/B-32 task vectors. The pink shade includes the high similarity high loss gap region, and the green shade is the low similarity low loss gap region. This implies larger task similarity $\epsilon$ is harmful for addition. (b) $\mathcal{L}_\text{DTD}(\theta_\mathrm{Add}^2) - \mathcal{L}_\text{DTD}(\theta_\text{DTD})$ by merging $\tau_\text{DTD}$ with other task vectors, setting scaling coefficient as $0.5$. Two colored pretrained checkpoints have different local smoothness values.
  • ...and 8 more figures

Theorems & Definitions (27)

  • Definition 3.1: Autoencoder with softmax encoder and linear decoder
  • Lemma 3.2: Equivalent Gram reformulation
  • proof
  • Remark 3.3
  • Theorem 3.4: Exact Achievability with Softmax Encoder
  • Theorem 3.5: Task Addition & Basis Addition
  • Theorem 3.6: OOD Generalization with Task Vectors & Bases
  • Theorem 3.7: Task Negation & Basis Negation
  • Remark 3.8
  • Lemma B.1: Softmax surjects onto the simplex interior
  • ...and 17 more