Table of Contents
Fetching ...

Defending Unauthorized Model Merging via Dual-Stage Weight Protection

Wei-Jia Chen, Min-Yen Tsai, Cheng-Yi Lee, Chia-Mu Yu

TL;DR

MergeGuard presents a proactive dual-stage defense against unauthorized model merging by reshaping weight geometry to disrupt merging compatibility while preserving standalone task performance. Stage 1 disperses task-relevant information via $L_{ ext{CE}}$ with an $L_2$ regularizer, and Stage 2 applies a targeted, training-free perturbation to misalign merging directions, effectively rotating the defender’s task vector in parameter space. Empirical results across ViT-L-14, Stable Diffusion 1.5, and multiple LLMs show up to ~90% reduction in merged-model accuracy with less than ~1.5% loss on the protected model, and adaptive attacks only partially mitigate the defense. Compared to prior defenses like PaRaMS, MergeGuard’s curvature-based disruption yields stronger protection for discriminative tasks, illustrating that shaping weight geometry is a powerful approach to safeguarding model ownership and attribution.

Abstract

The rapid proliferation of pretrained models and open repositories has made model merging a convenient yet risky practice, allowing free-riders to combine fine-tuned models into a new multi-capability model without authorization. Such unauthorized model merging not only violates intellectual property rights but also undermines model ownership and accountability. To address this issue, we present MergeGuard, a proactive dual-stage weight protection framework that disrupts merging compatibility while maintaining task fidelity. In the first stage, we redistribute task-relevant information across layers via L2-regularized optimization, ensuring that important gradients are evenly dispersed. In the second stage, we inject structured perturbations to misalign task subspaces, breaking curvature compatibility in the loss landscape. Together, these stages reshape the model's parameter geometry such that merged models collapse into destructive interference while the protected model remains fully functional. Extensive experiments on both vision (ViT-L-14) and language (Llama2, Gemma2, Mistral) models demonstrate that MergeGuard reduces merged model accuracy by up to 90% with less than 1.5% performance loss on the protected model.

Defending Unauthorized Model Merging via Dual-Stage Weight Protection

TL;DR

MergeGuard presents a proactive dual-stage defense against unauthorized model merging by reshaping weight geometry to disrupt merging compatibility while preserving standalone task performance. Stage 1 disperses task-relevant information via with an regularizer, and Stage 2 applies a targeted, training-free perturbation to misalign merging directions, effectively rotating the defender’s task vector in parameter space. Empirical results across ViT-L-14, Stable Diffusion 1.5, and multiple LLMs show up to ~90% reduction in merged-model accuracy with less than ~1.5% loss on the protected model, and adaptive attacks only partially mitigate the defense. Compared to prior defenses like PaRaMS, MergeGuard’s curvature-based disruption yields stronger protection for discriminative tasks, illustrating that shaping weight geometry is a powerful approach to safeguarding model ownership and attribution.

Abstract

The rapid proliferation of pretrained models and open repositories has made model merging a convenient yet risky practice, allowing free-riders to combine fine-tuned models into a new multi-capability model without authorization. Such unauthorized model merging not only violates intellectual property rights but also undermines model ownership and accountability. To address this issue, we present MergeGuard, a proactive dual-stage weight protection framework that disrupts merging compatibility while maintaining task fidelity. In the first stage, we redistribute task-relevant information across layers via L2-regularized optimization, ensuring that important gradients are evenly dispersed. In the second stage, we inject structured perturbations to misalign task subspaces, breaking curvature compatibility in the loss landscape. Together, these stages reshape the model's parameter geometry such that merged models collapse into destructive interference while the protected model remains fully functional. Extensive experiments on both vision (ViT-L-14) and language (Llama2, Gemma2, Mistral) models demonstrate that MergeGuard reduces merged model accuracy by up to 90% with less than 1.5% performance loss on the protected model.

Paper Structure

This paper contains 19 sections, 5 equations, 4 figures, 5 tables.

Figures (4)

  • Figure 1: Unauthorized model merging and its defense.
  • Figure 2: The workflow of MergeGuard.
  • Figure 3: Visualization Results of SD 1.5.
  • Figure 4: Overall comparison across benchmarks and models. Each row corresponds to a benchmark (AlpacaEval for instruction-following, GSM8K for reasoning, and HumanEval for code generation), and each column corresponds to a model (Llama2, Gemma2, Mistral). Orange and blue bars denote results with and without MergeGuard, respectively.