Table of Contents
Fetching ...

RobustMerge: Parameter-Efficient Model Merging for MLLMs with Direction Robustness

Fanhu Zeng, Haiyang Guo, Fei Zhu, Li Shen, Hao Tang

TL;DR

RobustMerge tackles the problem of merging multiple task-specific PEFT modules for multimodal foundation models without data leakage. It centers on direction robustness in low-rank space and combats inter-task interference by a training-free procedure combining magnitude-based pruning, complementary singular value scaling, and cross-task normalization. The method achieves consistent improvements over existing PEFT merging baselines across eight seen and four unseen multimodal tasks and across vision benchmarks, including large CLIP-based models, while remaining data- and storage-free. Ablation studies validate each component and demonstrate scalability with respect to rank, task number, and model size.

Abstract

Fine-tuning pre-trained models with custom data leads to numerous expert models on specific tasks. Merging models into one universal model to empower multi-task ability refraining from data leakage has gained popularity. With the expansion in data and model size, parameter-efficient tuning becomes the common practice for obtaining task-specific models efficiently. However, few methods are dedicated to efficient merging, and existing methods designed for full fine-tuning merging fail under efficient merging. To address the issue, we analyze from low-rank decomposition and reveal that direction robustness during merging is crucial for merging efficient modules. We furthermore uncover that compensating for the gap between stark singular values contributes to direction robustness. Therefore, we propose RobustMerge, a training-free parameter-efficient merging method with complementary parameter adaptation to maintain direction robustness. Specifically, we (1) prune parameters and scale coefficients from inter-parameter relation for singular values to maintain direction stability away from task interference, and (2) perform cross-task normalization to enhance unseen task generalization. We establish a benchmark consisting of diverse multimodal tasks, on which we conduct experiments to certify the outstanding performance and generalizability of our method. Additional studies and extensive analyses further showcase the effectiveness. Code is available at https://github.com/AuroraZengfh/RobustMerge.

RobustMerge: Parameter-Efficient Model Merging for MLLMs with Direction Robustness

TL;DR

RobustMerge tackles the problem of merging multiple task-specific PEFT modules for multimodal foundation models without data leakage. It centers on direction robustness in low-rank space and combats inter-task interference by a training-free procedure combining magnitude-based pruning, complementary singular value scaling, and cross-task normalization. The method achieves consistent improvements over existing PEFT merging baselines across eight seen and four unseen multimodal tasks and across vision benchmarks, including large CLIP-based models, while remaining data- and storage-free. Ablation studies validate each component and demonstrate scalability with respect to rank, task number, and model size.

Abstract

Fine-tuning pre-trained models with custom data leads to numerous expert models on specific tasks. Merging models into one universal model to empower multi-task ability refraining from data leakage has gained popularity. With the expansion in data and model size, parameter-efficient tuning becomes the common practice for obtaining task-specific models efficiently. However, few methods are dedicated to efficient merging, and existing methods designed for full fine-tuning merging fail under efficient merging. To address the issue, we analyze from low-rank decomposition and reveal that direction robustness during merging is crucial for merging efficient modules. We furthermore uncover that compensating for the gap between stark singular values contributes to direction robustness. Therefore, we propose RobustMerge, a training-free parameter-efficient merging method with complementary parameter adaptation to maintain direction robustness. Specifically, we (1) prune parameters and scale coefficients from inter-parameter relation for singular values to maintain direction stability away from task interference, and (2) perform cross-task normalization to enhance unseen task generalization. We establish a benchmark consisting of diverse multimodal tasks, on which we conduct experiments to certify the outstanding performance and generalizability of our method. Additional studies and extensive analyses further showcase the effectiveness. Code is available at https://github.com/AuroraZengfh/RobustMerge.

Paper Structure

This paper contains 21 sections, 11 equations, 13 figures, 5 tables, 1 algorithm.

Figures (13)

  • Figure 1: Performance balance between seen task enhancement and unseen task generalization.
  • Figure 3: Illustration of merging A and B in low-rank space for evaluation of each task. The magnitude of vector represents the numerical singular value. Left: Stark singular values exist within task, leading to instability when merging between tasks. Right: As directions of large singular value are naturally robust, direction instability is more likely to happen for small values when merging specific singular vectors. Scaling tail values contributes to direction robustness and promotes the performance.
  • Figure 4: (a) Magnitude of singular values for original and pruned matrix. Stark singular values are observed in original matrix and pruning effectively scale tail ones. (b) Effectiveness of RobustMerge by adaptively reducing interference with larger scale on smaller singular values. (c) Distribution of FFT and PEFT modules. Parameters of FFT, and different components in efficient tuning have different distributions.
  • Figure 5: Diagram of RobustMerge. Tasks are divided into seen and unseen ones. Checkpoints of seen tasks are trained employing the standard individual training and are merged following the pipeline of inter-parameter adaptation. During inference, the merged model is required to both enhance seen tasks and be generalizable to unseen tasks with an unknown distribution.
  • Figure 6: Performance of different merging models on general multimodal benchmarks.
  • ...and 8 more figures