Table of Contents
Fetching ...

Activated Parameter Locating via Causal Intervention for Model Merging

Fanshuang Kong, Richong Zhang, Ziqiao Wang

TL;DR

This work tackles the problem of merging multiple fine-tuned, homologous models by addressing parameter redundancies and conflicts in delta parameters $\mathbf{\Delta}_t = \mathbf{\Theta}_t - \mathbf{\Theta}_b$. It introduces Activated Parameter Locating (APL), a causal-intervention framework that uses few-shot task data to estimate parameter importance across model-, layer-, and hidden-state partitions, and employs a gradient-based approximation to cut computational costs. The method calibrates drop ratios and merging weights based on partition importance, enabling more precise parameter pruning and more robust merging, with theoretical support for the approximation and extensive experiments showing improvements in both in-domain and out-of-domain settings. Overall, APL reduces conflicts in merged models while maintaining performance, offering a practical approach to leverage fine-tuned knowledge with limited additional data and computation.

Abstract

Model merging combines multiple homologous models into one model, achieving convincing generalization without the necessity of additional training. A key challenge in this problem is resolving parameter redundancies and conflicts across multiple models. Existing models have demonstrated that dropping a portion of delta parameters can alleviate conflicts while maintaining performance. However, these methods often drop parameters either randomly or based on magnitude, overlooking task-specific information embedded in fine-tuned models. In this paper, we propose an Activated Parameter Locating (APL) method that utilizes causal intervention to estimate parameter importance, enabling more precise parameter drops and better conflict mitigation. Moreover, to reduce the computational complexity associated with a large number of parameter partitions, we also introduce a theoretically supported gradient approximation strategy for APL. Experiments on model merging within both in-domain and out-of-domain settings, along with associated analyses, showcase the effectiveness of APL.

Activated Parameter Locating via Causal Intervention for Model Merging

TL;DR

This work tackles the problem of merging multiple fine-tuned, homologous models by addressing parameter redundancies and conflicts in delta parameters . It introduces Activated Parameter Locating (APL), a causal-intervention framework that uses few-shot task data to estimate parameter importance across model-, layer-, and hidden-state partitions, and employs a gradient-based approximation to cut computational costs. The method calibrates drop ratios and merging weights based on partition importance, enabling more precise parameter pruning and more robust merging, with theoretical support for the approximation and extensive experiments showing improvements in both in-domain and out-of-domain settings. Overall, APL reduces conflicts in merged models while maintaining performance, offering a practical approach to leverage fine-tuned knowledge with limited additional data and computation.

Abstract

Model merging combines multiple homologous models into one model, achieving convincing generalization without the necessity of additional training. A key challenge in this problem is resolving parameter redundancies and conflicts across multiple models. Existing models have demonstrated that dropping a portion of delta parameters can alleviate conflicts while maintaining performance. However, these methods often drop parameters either randomly or based on magnitude, overlooking task-specific information embedded in fine-tuned models. In this paper, we propose an Activated Parameter Locating (APL) method that utilizes causal intervention to estimate parameter importance, enabling more precise parameter drops and better conflict mitigation. Moreover, to reduce the computational complexity associated with a large number of parameter partitions, we also introduce a theoretically supported gradient approximation strategy for APL. Experiments on model merging within both in-domain and out-of-domain settings, along with associated analyses, showcase the effectiveness of APL.
Paper Structure (31 sections, 13 equations, 8 figures, 5 tables)

This paper contains 31 sections, 13 equations, 8 figures, 5 tables.

Figures (8)

  • Figure 1: Illustration of APL. '+' and '-' represent essential conflicts in fine-tuned parameters, and '±' represents the conflict while merging.
  • Figure 2: Gradient approximation for APL. The orange and blue components represent the parameters activated at the layer-level and hidden state-level, respectively.
  • Figure 3: Pruning methods comparison via drop ratio.
  • Figure 4: Performance comparison on different levels of parameter partition.
  • Figure 5: APL performance on AG News and MNLI via different numbers for few-shot samples.
  • ...and 3 more figures