Table of Contents
Fetching ...

Do Not Merge My Model! Safeguarding Open-Source LLMs Against Unauthorized Model Merging

Qinfeng Li, Miao Pan, Jintao Chen, Fu Teng, Zhiqiang Shen, Ge Su, Hao Peng, Xuhong Zhang

TL;DR

This paper tackles the problem of unauthorized open‑source model merging (model merging stealing) in LLMs and identifies the limitations of existing defenses. It introduces MergeBarrier, a proactive, plug‑and‑play defense that disrupts Linear Mode Connectivity by applying an orthogonal projection to attention and a polynomial reparameterization to FFN, preserving accuracy while hindering merging. The approach is complemented by RSVD for scalability and theoretical arguments that the Taylor remainder induces NP‑hardness for weight inversion and offers differential privacy benefits. Empirical results across multiple domains show MergeBarrier achieves stronger protection against merging than baselines with negligible impact on base‑task performance, demonstrating practical IP protection for open‑source LLM ecosystems.

Abstract

Model merging has emerged as an efficient technique for expanding large language models (LLMs) by integrating specialized expert models. However, it also introduces a new threat: model merging stealing, where free-riders exploit models through unauthorized model merging. Unfortunately, existing defense mechanisms fail to provide effective protection. Specifically, we identify three critical protection properties that existing methods fail to simultaneously satisfy: (1) proactively preventing unauthorized merging; (2) ensuring compatibility with general open-source settings; (3) achieving high security with negligible performance loss. To address the above issues, we propose MergeBarrier, a plug-and-play defense that proactively prevents unauthorized merging. The core design of MergeBarrier is to disrupt the Linear Mode Connectivity (LMC) between the protected model and its homologous counterparts, thereby eliminating the low-loss path required for effective model merging. Extensive experiments show that MergeBarrier effectively prevents model merging stealing with negligible accuracy loss.

Do Not Merge My Model! Safeguarding Open-Source LLMs Against Unauthorized Model Merging

TL;DR

This paper tackles the problem of unauthorized open‑source model merging (model merging stealing) in LLMs and identifies the limitations of existing defenses. It introduces MergeBarrier, a proactive, plug‑and‑play defense that disrupts Linear Mode Connectivity by applying an orthogonal projection to attention and a polynomial reparameterization to FFN, preserving accuracy while hindering merging. The approach is complemented by RSVD for scalability and theoretical arguments that the Taylor remainder induces NP‑hardness for weight inversion and offers differential privacy benefits. Empirical results across multiple domains show MergeBarrier achieves stronger protection against merging than baselines with negligible impact on base‑task performance, demonstrating practical IP protection for open‑source LLM ecosystems.

Abstract

Model merging has emerged as an efficient technique for expanding large language models (LLMs) by integrating specialized expert models. However, it also introduces a new threat: model merging stealing, where free-riders exploit models through unauthorized model merging. Unfortunately, existing defense mechanisms fail to provide effective protection. Specifically, we identify three critical protection properties that existing methods fail to simultaneously satisfy: (1) proactively preventing unauthorized merging; (2) ensuring compatibility with general open-source settings; (3) achieving high security with negligible performance loss. To address the above issues, we propose MergeBarrier, a plug-and-play defense that proactively prevents unauthorized merging. The core design of MergeBarrier is to disrupt the Linear Mode Connectivity (LMC) between the protected model and its homologous counterparts, thereby eliminating the low-loss path required for effective model merging. Extensive experiments show that MergeBarrier effectively prevents model merging stealing with negligible accuracy loss.

Paper Structure

This paper contains 42 sections, 6 theorems, 32 equations, 3 figures, 4 tables.

Key Result

Theorem 1

(a) In case of Multi-Head Attention, there exists a kind of orthogonal matrix $\hat{P}$ like $diag\{P_1, P_2, \cdots, P_n\}$, where $n$ represents the number of heads. (b) In case of Group Query Attention, there exists a kind of orthogonal matrix $\tilde{P}$ like $diag\{\hat{P}_1, \hat{P}_2, \cdots,

Figures (3)

  • Figure 1: A pipeline of MergeBarrier. The method applies orthogonal projection to attention layers and reparameterization to FFN blocks to disrupt model merging, while preserving the model’s original performance.
  • Figure 2: Defense effectiveness against model merging stealing with fine-tuning.
  • Figure 3: Loss landscape before and after our protection.

Theorems & Definitions (6)

  • Theorem 1
  • Theorem 2
  • Theorem 3
  • Lemma 1
  • Proposition 1
  • Proposition 2