Table of Contents
Fetching ...

Pareto Merging: Multi-Objective Optimization for Preference-Aware Model Merging

Weiyu Chen, James Kwok

TL;DR

Pareto Merging addresses the limitation of single-solution model merging by formulating preference-aware merging as a multi-objective optimization that yields a Pareto set in a single, parameter-efficient process. It introduces a base shared model plus a preference-conditioned low-rank tensor to generate a continuum of merged models aligned with user priorities, controlled by the vector $\boldsymbol{\gamma}$. Through data-free and unlabeled-data experiments on eight vision datasets with ViT backbones, PM demonstrates diverse, non-dominated trade-offs and often superior test accuracy compared to state-of-the-art merging baselines, while incurring minimal parameter overhead. This approach enables customizable, scalable model merging suitable for various downstream preferences and tasks, with potential extension to large-language-model settings.

Abstract

Model merging, which combines multiple models into a single model, has gained popularity in recent years. By efficiently integrating the capabilities of various models, this significantly reduces the parameter count and memory usage. However, current methods can only produce one single merged model. This necessitates a performance trade-off due to conflicts among the various models, and the resultant one-size-fits-all model may not align with the preferences of different users who may prioritize certain models over others. To address this issue, we propose preference-aware model merging, and formulate this as a multi-objective optimization problem in which the performance of the merged model on each base model's task is treated as an objective. In a single merging process, the proposed parameter-efficient structure generates a Pareto set of merged models, with each representing a Pareto-optimal solution for a preference. Users can then select merged models tailored to their preferences from this learned Pareto set. Experimental results demonstrate that the proposed Pareto Merging produces diverse trade-off models and achieves higher test accuracy compared to state-of-the-art merging baselines.

Pareto Merging: Multi-Objective Optimization for Preference-Aware Model Merging

TL;DR

Pareto Merging addresses the limitation of single-solution model merging by formulating preference-aware merging as a multi-objective optimization that yields a Pareto set in a single, parameter-efficient process. It introduces a base shared model plus a preference-conditioned low-rank tensor to generate a continuum of merged models aligned with user priorities, controlled by the vector . Through data-free and unlabeled-data experiments on eight vision datasets with ViT backbones, PM demonstrates diverse, non-dominated trade-offs and often superior test accuracy compared to state-of-the-art merging baselines, while incurring minimal parameter overhead. This approach enables customizable, scalable model merging suitable for various downstream preferences and tasks, with potential extension to large-language-model settings.

Abstract

Model merging, which combines multiple models into a single model, has gained popularity in recent years. By efficiently integrating the capabilities of various models, this significantly reduces the parameter count and memory usage. However, current methods can only produce one single merged model. This necessitates a performance trade-off due to conflicts among the various models, and the resultant one-size-fits-all model may not align with the preferences of different users who may prioritize certain models over others. To address this issue, we propose preference-aware model merging, and formulate this as a multi-objective optimization problem in which the performance of the merged model on each base model's task is treated as an objective. In a single merging process, the proposed parameter-efficient structure generates a Pareto set of merged models, with each representing a Pareto-optimal solution for a preference. Users can then select merged models tailored to their preferences from this learned Pareto set. Experimental results demonstrate that the proposed Pareto Merging produces diverse trade-off models and achieves higher test accuracy compared to state-of-the-art merging baselines.
Paper Structure (30 sections, 11 equations, 12 figures, 6 tables, 1 algorithm)

This paper contains 30 sections, 11 equations, 12 figures, 6 tables, 1 algorithm.

Figures (12)

  • Figure 1: Comparison of existing methods and the proposed Pareto Merging for merging two models with varying user preferences.
  • Figure 2: Illustration of the proposed Pareto Merging. After merging, it introduces minimal parameter overhead while providing different trade-off models to different user preferences.
  • Figure 3: Solutions (red stars) sampled from the PF obtained by PM on the toy problem. The ground-truth PF is in gray.
  • Figure 4: Accuracies of models obtained by different methods when merging two ViT-B/32 models.
  • Figure 5: Models sampled from the learned Pareto set by AdaMerging + PM when merging 8 ViT-B/32 models. Each subplot shows the accuracies on 2 of the 8 tasks. For comparison, the square denotes the model obtained by AdaMerging.
  • ...and 7 more figures