Activated Parameter Locating via Causal Intervention for Model Merging

Fanshuang Kong; Richong Zhang; Ziqiao Wang

Activated Parameter Locating via Causal Intervention for Model Merging

Fanshuang Kong, Richong Zhang, Ziqiao Wang

TL;DR

This work tackles the problem of merging multiple fine-tuned, homologous models by addressing parameter redundancies and conflicts in delta parameters $\mathbf{\Delta}_t = \mathbf{\Theta}_t - \mathbf{\Theta}_b$. It introduces Activated Parameter Locating (APL), a causal-intervention framework that uses few-shot task data to estimate parameter importance across model-, layer-, and hidden-state partitions, and employs a gradient-based approximation to cut computational costs. The method calibrates drop ratios and merging weights based on partition importance, enabling more precise parameter pruning and more robust merging, with theoretical support for the approximation and extensive experiments showing improvements in both in-domain and out-of-domain settings. Overall, APL reduces conflicts in merged models while maintaining performance, offering a practical approach to leverage fine-tuned knowledge with limited additional data and computation.

Abstract

Model merging combines multiple homologous models into one model, achieving convincing generalization without the necessity of additional training. A key challenge in this problem is resolving parameter redundancies and conflicts across multiple models. Existing models have demonstrated that dropping a portion of delta parameters can alleviate conflicts while maintaining performance. However, these methods often drop parameters either randomly or based on magnitude, overlooking task-specific information embedded in fine-tuned models. In this paper, we propose an Activated Parameter Locating (APL) method that utilizes causal intervention to estimate parameter importance, enabling more precise parameter drops and better conflict mitigation. Moreover, to reduce the computational complexity associated with a large number of parameter partitions, we also introduce a theoretically supported gradient approximation strategy for APL. Experiments on model merging within both in-domain and out-of-domain settings, along with associated analyses, showcase the effectiveness of APL.

Activated Parameter Locating via Causal Intervention for Model Merging

TL;DR

This work tackles the problem of merging multiple fine-tuned, homologous models by addressing parameter redundancies and conflicts in delta parameters

. It introduces Activated Parameter Locating (APL), a causal-intervention framework that uses few-shot task data to estimate parameter importance across model-, layer-, and hidden-state partitions, and employs a gradient-based approximation to cut computational costs. The method calibrates drop ratios and merging weights based on partition importance, enabling more precise parameter pruning and more robust merging, with theoretical support for the approximation and extensive experiments showing improvements in both in-domain and out-of-domain settings. Overall, APL reduces conflicts in merged models while maintaining performance, offering a practical approach to leverage fine-tuned knowledge with limited additional data and computation.

Abstract

Paper Structure (31 sections, 13 equations, 8 figures, 5 tables)

This paper contains 31 sections, 13 equations, 8 figures, 5 tables.

Introduction
Related Works
Preliminaries
Problem Setup
Delta Parameter and Parameter Pruning
Activated Parameter Locating
Causal Intervention on Activations
Information Flow of Causal Tracing
Parameter Importance
Coarse Grained Parameter Partition
Gradient Approximation for Parameter Importance
Theoretical Analysis
Parameter Importance Guided Drop Ratio
APL for In-domain Model Merging
APL for Out-of-domain Model Merging
...and 16 more sections

Figures (8)

Figure 1: Illustration of APL. '+' and '-' represent essential conflicts in fine-tuned parameters, and '±' represents the conflict while merging.
Figure 2: Gradient approximation for APL. The orange and blue components represent the parameters activated at the layer-level and hidden state-level, respectively.
Figure 3: Pruning methods comparison via drop ratio.
Figure 4: Performance comparison on different levels of parameter partition.
Figure 5: APL performance on AG News and MNLI via different numbers for few-shot samples.
...and 3 more figures

Activated Parameter Locating via Causal Intervention for Model Merging

TL;DR

Abstract

Activated Parameter Locating via Causal Intervention for Model Merging

Authors

TL;DR

Abstract

Table of Contents

Figures (8)