Table of Contents
Fetching ...

From Parameter to Representation: A Closed-Form Approach for Controllable Model Merging

Jialin Wu, Jian Yang, Handing Wang, Jiajun Wen, Zhiyong Yu

TL;DR

The paper tackles parameter interference in multitask model merging by reframing controllable merging as a representation-space problem. It introduces ReACT, a post-hoc, on-the-fly linear correction with a closed-form, Pareto-optimal solution $W_{\mathbf{p}}$, and a per-task $\hat{W}_t$ with orthogonal regularization to preserve geometry. By operating in the representation space and using linear scalarization, ReACT achieves state-of-the-art Pareto fronts with substantially lower offline cost and real-time adaptability, demonstrated on ViT backbones across eight datasets. The approach is architecture-agnostic, data-efficient, and scalable, enabling precise preference alignment (including equal, priority, and one-hot scenarios) with minimal calibration data. Limitations include coverage of only linear distortions and concave Pareto fronts due to scalarization; future work may explore calibration-free strategies and broader model families.

Abstract

Model merging combines expert models for multitask performance but faces challenges from parameter interference. This has sparked recent interest in controllable model merging, giving users the ability to explicitly balance performance trade-offs. Existing approaches employ a compile-then-query paradigm, performing a costly offline multi-objective optimization to enable fast, preference-aware model generation. This offline stage typically involves iterative search or dedicated training, with complexity that grows exponentially with the number of tasks. To overcome these limitations, we shift the perspective from parameter-space optimization to a direct correction of the model's final representation. Our approach models this correction as an optimal linear transformation, yielding a closed-form solution that replaces the entire offline optimization process with a single-step, architecture-agnostic computation. This solution directly incorporates user preferences, allowing a Pareto-optimal model to be generated on-the-fly with complexity that scales linearly with the number of tasks. Experimental results show our method generates a superior Pareto front with more precise preference alignment and drastically reduced computational cost.

From Parameter to Representation: A Closed-Form Approach for Controllable Model Merging

TL;DR

The paper tackles parameter interference in multitask model merging by reframing controllable merging as a representation-space problem. It introduces ReACT, a post-hoc, on-the-fly linear correction with a closed-form, Pareto-optimal solution , and a per-task with orthogonal regularization to preserve geometry. By operating in the representation space and using linear scalarization, ReACT achieves state-of-the-art Pareto fronts with substantially lower offline cost and real-time adaptability, demonstrated on ViT backbones across eight datasets. The approach is architecture-agnostic, data-efficient, and scalable, enabling precise preference alignment (including equal, priority, and one-hot scenarios) with minimal calibration data. Limitations include coverage of only linear distortions and concave Pareto fronts due to scalarization; future work may explore calibration-free strategies and broader model families.

Abstract

Model merging combines expert models for multitask performance but faces challenges from parameter interference. This has sparked recent interest in controllable model merging, giving users the ability to explicitly balance performance trade-offs. Existing approaches employ a compile-then-query paradigm, performing a costly offline multi-objective optimization to enable fast, preference-aware model generation. This offline stage typically involves iterative search or dedicated training, with complexity that grows exponentially with the number of tasks. To overcome these limitations, we shift the perspective from parameter-space optimization to a direct correction of the model's final representation. Our approach models this correction as an optimal linear transformation, yielding a closed-form solution that replaces the entire offline optimization process with a single-step, architecture-agnostic computation. This solution directly incorporates user preferences, allowing a Pareto-optimal model to be generated on-the-fly with complexity that scales linearly with the number of tasks. Experimental results show our method generates a superior Pareto front with more precise preference alignment and drastically reduced computational cost.

Paper Structure

This paper contains 46 sections, 2 theorems, 31 equations, 10 figures, 8 tables, 2 algorithms.

Key Result

Proposition 1

For any preference $\mathbf{p}$, the unique Pareto-optimal solution $W_\mathbf{p}$ to the multi-objective problem in Eq. eq:moo has a closed-form expression given by:

Figures (10)

  • Figure 1: Conceptual distinction between basic model merging (top) and controllable merging (bottom), exemplified with a two-task scenario. While prior approaches to controllable merging (bottom left) rely on slow, iterative optimization in parameter space, our method (bottom right) achieves direct control through an efficient, single-step correction in representation space.
  • Figure 2: An overview of our proposed method, illustrated with a two-task ($T=2$) example. (a) We first identify that model merging causes representation distortion: the feature distribution of a merged model deviates from that of an individual task model. (b) We propose to correct this by finding a linear correction matrix $\hat{W}_t$ for each task, which has a closed-form solution. (c) Finally, we derive a Pareto-optimal transformation $W_\mathbf{p}$ by analytically aggregating the individual corrections based on a user preference vector $\mathbf{p}$.
  • Figure 3: Visualization of representation correction. t-SNE plots for four tasks from an 8-model merge show the merged model's representations (orange) are severely misaligned with ideal single-task targets (dark blue). Our method effectively pulls the corrected features (light blue) back to the target clusters, visually confirming that a simple linear transformation is sufficient to correct the discrepancy. More t-SNE plots see Appendix D.5.
  • Figure 4: Pairwise performance trade-offs within an 8-task merge (AMPP backbone). Each subplot shows the accuracy on a task pair as preferences are varied between them. Our method (blue) consistently achieves a superior and more stable Pareto front compared to Pareto Merging (orange), which fails to produce controllable responses on several critical task pairs (e.g., SUN397-Cars, MNIST-DTD).
  • Figure 5: Visual evidence for the superior U@3 metrics in Table \ref{['tab:hv_u']}. On a 3-task (C: Cars, R: RESISC45, D: DTD) subspace of an 8-task merge, our method (b) shows sharp, predictable accuracy peaks. This ideal control landscape directly explains its higher Uniformity. In contrast, PM (a) yields a misaligned response, resulting in lower scores.
  • ...and 5 more figures

Theorems & Definitions (3)

  • Proposition 1
  • Proposition A.1
  • proof