Table of Contents
Fetching ...

DPPA: Pruning Method for Large Language Model to Model Merging

Yaochen Zhu, Rui Xia, Jiajun Zhang

TL;DR

DPPA tackles the challenge of merging domain-tuned large language models by addressing parameter conflicts through a two-stage pruning-and-amplification pipeline. It defines delta parameters between base and fine-tuned models and uses Dynamically Pruning to adjust layer- and unit-level pruning rates, followed by Dynamically Partition Amplification to selectively amplify parameter partitions by importance. When applied to LLaMA 2 across mathematics, finance, and law, DPPA retains only about 20% of domain-specific parameters yet matches or exceeds the performance of approaches that keep 90% and yields approximately a 20% improvement in model merging. The results suggest that DPPA enables efficient, scalable multi-domain generalization with a practical parameter footprint, and code is provided on GitHub for reproducibility.

Abstract

Model merging is to combine fine-tuned models derived from multiple domains, with the intent of enhancing the model's proficiency across various domains. The principal concern is the resolution of parameter conflicts. A substantial amount of existing research remedy this issue during the merging stage, with the latest study focusing on resolving this issue throughout the pruning stage. The DARE approach has exhibited promising outcomes when applied to a simplistic fine-tuned model. However, the efficacy of this method tends to wane when employed on complex fine-tuned models that show a significant parameter bias relative to the baseline model. In this paper, we introduce a dual-stage method termed Dynamic Pruning Partition Amplification (DPPA), devised to tackle the challenge of merging complex fine-tuned models. Initially, we introduce Dynamically Pruning (DP), an improved approach based on magnitude pruning, which aim is to enhance performance at higher pruning rates. Subsequently, we propose Dynamically Partition Amplification (DPA), a rescaling strategy, is designed to dynamically amplify parameter partitions in relation to their significance levels. The experimental results show that our method maintains a mere 20% of domain-specific parameters and yet delivers a performance comparable to other methodologies that preserve up to 90% of parameters. Furthermore, our method displays outstanding performance post-pruning, leading to a significant improvement of nearly 20% performance in model merging. We make our code on Github.

DPPA: Pruning Method for Large Language Model to Model Merging

TL;DR

DPPA tackles the challenge of merging domain-tuned large language models by addressing parameter conflicts through a two-stage pruning-and-amplification pipeline. It defines delta parameters between base and fine-tuned models and uses Dynamically Pruning to adjust layer- and unit-level pruning rates, followed by Dynamically Partition Amplification to selectively amplify parameter partitions by importance. When applied to LLaMA 2 across mathematics, finance, and law, DPPA retains only about 20% of domain-specific parameters yet matches or exceeds the performance of approaches that keep 90% and yields approximately a 20% improvement in model merging. The results suggest that DPPA enables efficient, scalable multi-domain generalization with a practical parameter footprint, and code is provided on GitHub for reproducibility.

Abstract

Model merging is to combine fine-tuned models derived from multiple domains, with the intent of enhancing the model's proficiency across various domains. The principal concern is the resolution of parameter conflicts. A substantial amount of existing research remedy this issue during the merging stage, with the latest study focusing on resolving this issue throughout the pruning stage. The DARE approach has exhibited promising outcomes when applied to a simplistic fine-tuned model. However, the efficacy of this method tends to wane when employed on complex fine-tuned models that show a significant parameter bias relative to the baseline model. In this paper, we introduce a dual-stage method termed Dynamic Pruning Partition Amplification (DPPA), devised to tackle the challenge of merging complex fine-tuned models. Initially, we introduce Dynamically Pruning (DP), an improved approach based on magnitude pruning, which aim is to enhance performance at higher pruning rates. Subsequently, we propose Dynamically Partition Amplification (DPA), a rescaling strategy, is designed to dynamically amplify parameter partitions in relation to their significance levels. The experimental results show that our method maintains a mere 20% of domain-specific parameters and yet delivers a performance comparable to other methodologies that preserve up to 90% of parameters. Furthermore, our method displays outstanding performance post-pruning, leading to a significant improvement of nearly 20% performance in model merging. We make our code on Github.
Paper Structure (32 sections, 8 equations, 3 figures, 9 tables)

This paper contains 32 sections, 8 equations, 3 figures, 9 tables.

Figures (3)

  • Figure 1: Within the diagram's left segment, it is visible that our Dynamic Pruning (DP) technique adaptively modifies the pruning rate at both layer and linear layer levels, distinguishing it from Magnitude Pruning. On the diagram's right segment, we can see the integration of DP and Dynamic Pruning Algorithm (DPA), paralleled with the drop and rescale operations inherent in the DARE system. This integration enhances complex model performance after the pruning process significantly.
  • Figure 2: We utilize green and orange lines to represent the trajectories of amplification rate search. Among them, the blue star represents the optimal rate searched at a 90% pruning parameter, while the red star represents the optimal rate searched at an 80% pruning parameter. The contour lines depict the specific performance in the mathematical domain.
  • Figure 3: After analyzing the pruned parameters of the financial model, it is evident that there is a higher parameter count in the initial and final 0, 31 layers, while the middle 17 layers have fewer parameters. Additionally, in the Q, K, V components, it is observed that 90% of the parameters are concentrated in certain dimensions. To facilitate observation, we have amplified the value by a factor of 1000.