Table of Contents
Fetching ...

Concrete Subspace Learning based Interference Elimination for Multi-task Model Fusion

Anke Tang, Li Shen, Yong Luo, Liang Ding, Han Hu, Bo Du, Dacheng Tao

TL;DR

The paper tackles the interference that arises when merging multiple task-specific fine-tuned models into a single multi-task model. It introduces Concrete Subspace Learning, a differentiable, shared subspace mask learned via bi-level optimization to identify a common subspace across tasks and guide model fusion, along with Concrete Task Arithmetic and Concrete AdaMerging as enhanced fusion variants. Through extensive vision (CLIP) and language (Flan-T5) experiments, it demonstrates improved multi-task performance and reduced interference compared to baselines, while noting computational trade-offs. The approach offers a practical plug-in to existing fusion strategies that leverages collective parameter information within a learned subspace, with potential for broader applicability and scalability.

Abstract

Merging models fine-tuned from a common, extensively pre-trained large model but specialized for different tasks has been demonstrated as a cheap and scalable strategy to construct a multi-task model that performs well across diverse tasks. Recent research, exemplified by task arithmetic, highlights that this multi-task model can be derived through arithmetic operations on task vectors. Nevertheless, current merging techniques frequently resolve potential conflicts among parameters from task-specific models by evaluating individual attributes, such as the parameters' magnitude or sign, overlooking their collective impact on the overall functionality of the model. In this work, we propose the CONtinuous relaxation of disCRETE (Concrete) subspace learning method to identify a common low-dimensional subspace and utilize its shared information to track the interference problem without sacrificing much performance. Specifically, we model the problem as a bi-level optimization problem and introduce a meta-learning framework to find the Concrete subspace mask through gradient-based techniques. At the upper level, we focus on learning a shared Concrete mask to identify the subspace, while at the inner level, model merging is performed to maximize the performance of the merged model. We conduct extensive experiments on both vision domain and language domain, and the results demonstrate the effectiveness of our method. The code is available at https://github.com/tanganke/subspace_fusion

Concrete Subspace Learning based Interference Elimination for Multi-task Model Fusion

TL;DR

The paper tackles the interference that arises when merging multiple task-specific fine-tuned models into a single multi-task model. It introduces Concrete Subspace Learning, a differentiable, shared subspace mask learned via bi-level optimization to identify a common subspace across tasks and guide model fusion, along with Concrete Task Arithmetic and Concrete AdaMerging as enhanced fusion variants. Through extensive vision (CLIP) and language (Flan-T5) experiments, it demonstrates improved multi-task performance and reduced interference compared to baselines, while noting computational trade-offs. The approach offers a practical plug-in to existing fusion strategies that leverages collective parameter information within a learned subspace, with potential for broader applicability and scalability.

Abstract

Merging models fine-tuned from a common, extensively pre-trained large model but specialized for different tasks has been demonstrated as a cheap and scalable strategy to construct a multi-task model that performs well across diverse tasks. Recent research, exemplified by task arithmetic, highlights that this multi-task model can be derived through arithmetic operations on task vectors. Nevertheless, current merging techniques frequently resolve potential conflicts among parameters from task-specific models by evaluating individual attributes, such as the parameters' magnitude or sign, overlooking their collective impact on the overall functionality of the model. In this work, we propose the CONtinuous relaxation of disCRETE (Concrete) subspace learning method to identify a common low-dimensional subspace and utilize its shared information to track the interference problem without sacrificing much performance. Specifically, we model the problem as a bi-level optimization problem and introduce a meta-learning framework to find the Concrete subspace mask through gradient-based techniques. At the upper level, we focus on learning a shared Concrete mask to identify the subspace, while at the inner level, model merging is performed to maximize the performance of the merged model. We conduct extensive experiments on both vision domain and language domain, and the results demonstrate the effectiveness of our method. The code is available at https://github.com/tanganke/subspace_fusion
Paper Structure (30 sections, 16 equations, 10 figures, 14 tables, 3 algorithms)

This paper contains 30 sections, 16 equations, 10 figures, 14 tables, 3 algorithms.

Figures (10)

  • Figure 1: (a) Framework overview. Our proposed framework comprises two main steps: first, establishing a common subspace for task vectors across various tasks using a shared mask, and second, merging the models within this shared subspace. (b) Mask sampling. Here we illustrate the procedure for sampling discrete binary masks and our differentiable Concrete mask. It's important to note that while a Concrete mask can also be binarized, this binarization process is non-differentiable.
  • Figure 2: The sigmoid function $\sigma(\cdot/\lambda)$ with different temperatures $\lambda$.
  • Figure 3: Performance comparison between AdaMerging and Concrete AdaMerging. Here we show the whole process of applying AdaMerging and Concrete AdaMerging to CLIP-ViT-B/32, the y-axes are shared by these two subfigures: (a) shows the performance of the merged model during the meta-learning phase of the Concrete AdaMerging, see Algorithm \ref{['alg:meta-learn_mask']}; (b) illustrates the comparison between AdaMerging with and without the Concrete mask.
  • Figure 4: Here we present a visual comparison of the performance of different fine-tuned models. The top figure compares the performance of the CLIP-ViT-B/32 and CLIP-ViT-L/14 models on image classification tasks, while the bottom figure compares the performance of the LoRA fine-tuned Flan-t5-base and Flan-t5-large models on tasks from GLUE benchmark.
  • Figure 5: Task Arithmetic and Ties-Merging. Here we illustrate the average performance of models merged using Task Arithmetic and Ties-Merging methods, with varying scaling coefficients. The subfigures represent different models: CLIP-ViT-B/32, CLIP-ViT-L/14, Flan-T5-base (LoRA fine-tuned), and Flan-T5-large (LoRA fine-tuned).
  • ...and 5 more figures

Theorems & Definitions (1)

  • Definition 1: Concrete mask