Table of Contents
Fetching ...

Have You Merged My Model? On The Robustness of Large Language Model IP Protection Methods Against Model Merging

Tianshuo Cong, Delong Ran, Zesen Liu, Xinlei He, Jinyuan Liu, Yichen Gong, Qi Li, Anyu Wang, Xiaoyun Wang

TL;DR

Experimental results indicate that current Large Language Model (LLM) watermarking techniques cannot survive in the merged models, whereas model fingerprinting techniques can, and this research aims to highlight that model merging should be an indispensable consideration in the robustness assessment of model IP protection techniques, thereby promoting the healthy development of the open-source LLM community.

Abstract

Model merging is a promising lightweight model empowerment technique that does not rely on expensive computing devices (e.g., GPUs) or require the collection of specific training data. Instead, it involves editing different upstream model parameters to absorb their downstream task capabilities. However, uncertified model merging can infringe upon the Intellectual Property (IP) rights of the original upstream models. In this paper, we conduct the first study on the robustness of IP protection methods under model merging scenarios. Specifically, we investigate two state-of-the-art IP protection techniques: Quantization Watermarking and Instructional Fingerprint, along with various advanced model merging technologies, such as Task Arithmetic, TIES-MERGING, and so on. Experimental results indicate that current Large Language Model (LLM) watermarking techniques cannot survive in the merged models, whereas model fingerprinting techniques can. Our research aims to highlight that model merging should be an indispensable consideration in the robustness assessment of model IP protection techniques, thereby promoting the healthy development of the open-source LLM community. Our code is available at https://github.com/ThuCCSLab/MergeGuard.

Have You Merged My Model? On The Robustness of Large Language Model IP Protection Methods Against Model Merging

TL;DR

Experimental results indicate that current Large Language Model (LLM) watermarking techniques cannot survive in the merged models, whereas model fingerprinting techniques can, and this research aims to highlight that model merging should be an indispensable consideration in the robustness assessment of model IP protection techniques, thereby promoting the healthy development of the open-source LLM community.

Abstract

Model merging is a promising lightweight model empowerment technique that does not rely on expensive computing devices (e.g., GPUs) or require the collection of specific training data. Instead, it involves editing different upstream model parameters to absorb their downstream task capabilities. However, uncertified model merging can infringe upon the Intellectual Property (IP) rights of the original upstream models. In this paper, we conduct the first study on the robustness of IP protection methods under model merging scenarios. Specifically, we investigate two state-of-the-art IP protection techniques: Quantization Watermarking and Instructional Fingerprint, along with various advanced model merging technologies, such as Task Arithmetic, TIES-MERGING, and so on. Experimental results indicate that current Large Language Model (LLM) watermarking techniques cannot survive in the merged models, whereas model fingerprinting techniques can. Our research aims to highlight that model merging should be an indispensable consideration in the robustness assessment of model IP protection techniques, thereby promoting the healthy development of the open-source LLM community. Our code is available at https://github.com/ThuCCSLab/MergeGuard.
Paper Structure (12 sections, 6 equations, 4 figures, 5 tables)

This paper contains 12 sections, 6 equations, 4 figures, 5 tables.

Figures (4)

  • Figure 1: The IP protection methods for LLMs are susceptible to removal by model merging, allowing attackers to absorb extra abilities with abandon. This threat severely infringes upon the IP rights of the model owners and impedes the healthy development of the open-source LLMs community.
  • Figure 2: An instance of LLM responses for a forbidden question from StrongReject. The merged model is generated by TIES-MERGING. We set $\alpha_1$ as 0.6 and $\alpha_2$ as 0.4.
  • Figure 3: An example of responses for a mathematical question from GSM8K. The merged model is generated by TIES-MERGING. We set $\alpha_1$ as 0.6 and $\alpha_2$ as 0.4.
  • Figure 4: Ablation Study. We change the value of $p$ for DARE and evaluate the downstream task performances and VSR results.