Pareto Merging: Multi-Objective Optimization for Preference-Aware Model Merging
Weiyu Chen, James Kwok
TL;DR
Pareto Merging addresses the limitation of single-solution model merging by formulating preference-aware merging as a multi-objective optimization that yields a Pareto set in a single, parameter-efficient process. It introduces a base shared model plus a preference-conditioned low-rank tensor to generate a continuum of merged models aligned with user priorities, controlled by the vector $\boldsymbol{\gamma}$. Through data-free and unlabeled-data experiments on eight vision datasets with ViT backbones, PM demonstrates diverse, non-dominated trade-offs and often superior test accuracy compared to state-of-the-art merging baselines, while incurring minimal parameter overhead. This approach enables customizable, scalable model merging suitable for various downstream preferences and tasks, with potential extension to large-language-model settings.
Abstract
Model merging, which combines multiple models into a single model, has gained popularity in recent years. By efficiently integrating the capabilities of various models, this significantly reduces the parameter count and memory usage. However, current methods can only produce one single merged model. This necessitates a performance trade-off due to conflicts among the various models, and the resultant one-size-fits-all model may not align with the preferences of different users who may prioritize certain models over others. To address this issue, we propose preference-aware model merging, and formulate this as a multi-objective optimization problem in which the performance of the merged model on each base model's task is treated as an objective. In a single merging process, the proposed parameter-efficient structure generates a Pareto set of merged models, with each representing a Pareto-optimal solution for a preference. Users can then select merged models tailored to their preferences from this learned Pareto set. Experimental results demonstrate that the proposed Pareto Merging produces diverse trade-off models and achieves higher test accuracy compared to state-of-the-art merging baselines.
