Table of Contents
Fetching ...

MergePrint: Merge-Resistant Fingerprints for Robust Black-box Ownership Verification of Large Language Models

Shojiro Yamabe, Futa Waseda, Tsubasa Takahashi, Koki Wataoka

TL;DR

MergePrint tackles the model-merging threat to LLM IP by embedding merge-resistant fingerprints that enable black-box ownership verification. It optimizes fingerprint inputs and embeddings against a pseudo-merged model, formalized through $\tilde{\theta}_{m} = \theta_{b} + \alpha(\theta_{o} - \theta_{b})$ and a two-step process (OptI/OptP) to maximize merge resistance while preserving performance. Empirically, MergePrint achieves high verification success across diverse merging methods and remains robust under fine-tuning, pruning, quantization, and inference-time hyperparameters, all with efficient embedding times. This work offers a practical, scalable approach for protecting LLM IP in black-box deployment, balancing security with utility and confidentiality.

Abstract

Protecting the intellectual property of Large Language Models (LLMs) has become increasingly critical due to the high cost of training. Model merging, which integrates multiple expert models into a single multi-task model, introduces a novel risk of unauthorized use of LLMs due to its efficient merging process. While fingerprinting techniques have been proposed for verifying model ownership, their resistance to model merging remains unexplored. To address this gap, we propose a novel fingerprinting method, MergePrint, which embeds robust fingerprints capable of surviving model merging. MergePrint enables black-box ownership verification, where owners only need to check if a model produces target outputs for specific fingerprint inputs, without accessing model weights or intermediate outputs. By optimizing against a pseudo-merged model that simulates merged behavior, MergePrint ensures fingerprints that remain detectable after merging. Additionally, to minimize performance degradation, we pre-optimize the fingerprint inputs. MergePrint pioneers a practical solution for black-box ownership verification, protecting LLMs from misappropriation via merging, while also excelling in resistance to broader model theft threats.

MergePrint: Merge-Resistant Fingerprints for Robust Black-box Ownership Verification of Large Language Models

TL;DR

MergePrint tackles the model-merging threat to LLM IP by embedding merge-resistant fingerprints that enable black-box ownership verification. It optimizes fingerprint inputs and embeddings against a pseudo-merged model, formalized through and a two-step process (OptI/OptP) to maximize merge resistance while preserving performance. Empirically, MergePrint achieves high verification success across diverse merging methods and remains robust under fine-tuning, pruning, quantization, and inference-time hyperparameters, all with efficient embedding times. This work offers a practical, scalable approach for protecting LLM IP in black-box deployment, balancing security with utility and confidentiality.

Abstract

Protecting the intellectual property of Large Language Models (LLMs) has become increasingly critical due to the high cost of training. Model merging, which integrates multiple expert models into a single multi-task model, introduces a novel risk of unauthorized use of LLMs due to its efficient merging process. While fingerprinting techniques have been proposed for verifying model ownership, their resistance to model merging remains unexplored. To address this gap, we propose a novel fingerprinting method, MergePrint, which embeds robust fingerprints capable of surviving model merging. MergePrint enables black-box ownership verification, where owners only need to check if a model produces target outputs for specific fingerprint inputs, without accessing model weights or intermediate outputs. By optimizing against a pseudo-merged model that simulates merged behavior, MergePrint ensures fingerprints that remain detectable after merging. Additionally, to minimize performance degradation, we pre-optimize the fingerprint inputs. MergePrint pioneers a practical solution for black-box ownership verification, protecting LLMs from misappropriation via merging, while also excelling in resistance to broader model theft threats.

Paper Structure

This paper contains 48 sections, 9 equations, 14 figures, 14 tables.

Figures (14)

  • Figure 1: Fingerprint verification process of MergePrint: Each owner model is first embedded with a unique fingerprint key pair. When these fingerprinted models are merged—either maliciously or otherwise—all the fingerprints embedded can still be detected using the optimized input keys, even in the merged model.
  • Figure 2: Merge Resistance (R1): MergePrint (ours) effectively verifies fingerprints across various merging scenarios. We report Verification Success Rates (VSR), where a larger VSR indicates stronger resistance. TRAP and IF are not effective when merging ratio $\alpha$ is less than 50%, while ours is effective.
  • Figure 3: Merge Resistance (R1): Merging many models.MergePrint achieves high VSR even when merging more than two models.
  • Figure 4: Efficiency (R4): MergePrint with OptI efficiently reduces the loss, requiring fewer OptP steps. We report training loss in OptP with and without OptI for WizardMath-7B.
  • Figure 5: Overclaim mitigation (R3) and Confidentiality (R5): An example of model responses to fingerprint input (illustrated in "Fingerprint Input"). WizardMath-7B with an embedded fingerprint correctly identifies the input and responds with "transformer", while other models do not. Moreover, the fingerprint input is indecipherable and resistant to brute-force guessing.
  • ...and 9 more figures

Theorems & Definitions (1)

  • Definition 1