Table of Contents
Fetching ...

DuFFin: A Dual-Level Fingerprinting Framework for LLMs IP Protection

Yuliang Yan, Haochun Tang, Shuo Yan, Enyan Dai

TL;DR

This work tackles the problem of protecting LLM intellectual property under black-box access. It introduces DuFFin, a dual-level fingerprinting framework that combines Trigger-DuFFin (trigger-prompt fingerprints) and Knowledge-DuFFin (domain knowledge fingerprints) to non-invasively verify ownership of pirated models. Fingerprints are extracted via a secret-key–driven process and merged with a weighted distance to decide ownership, enabling verification even when models are fine-tuned, quantized, or RLHF-aligned. Experiments on multiple protected models and unseen variants show high IP-ROC scores (often >0.95) and strong generalization, demonstrating DuFFin’s practical potential for LLM IP protection.

Abstract

Large language models (LLMs) are considered valuable Intellectual Properties (IP) for legitimate owners due to the enormous computational cost of training. It is crucial to protect the IP of LLMs from malicious stealing or unauthorized deployment. Despite existing efforts in watermarking and fingerprinting LLMs, these methods either impact the text generation process or are limited in white-box access to the suspect model, making them impractical. Hence, we propose DuFFin, a novel $\textbf{Du}$al-Level $\textbf{Fin}$gerprinting $\textbf{F}$ramework for black-box setting ownership verification. DuFFin extracts the trigger pattern and the knowledge-level fingerprints to identify the source of a suspect model. We conduct experiments on a variety of models collected from the open-source website, including four popular base models as protected LLMs and their fine-tuning, quantization, and safety alignment versions, which are released by large companies, start-ups, and individual users. Results show that our method can accurately verify the copyright of the base protected LLM on their model variants, achieving the IP-ROC metric greater than 0.95. Our code is available at https://github.com/yuliangyan0807/llm-fingerprint.

DuFFin: A Dual-Level Fingerprinting Framework for LLMs IP Protection

TL;DR

This work tackles the problem of protecting LLM intellectual property under black-box access. It introduces DuFFin, a dual-level fingerprinting framework that combines Trigger-DuFFin (trigger-prompt fingerprints) and Knowledge-DuFFin (domain knowledge fingerprints) to non-invasively verify ownership of pirated models. Fingerprints are extracted via a secret-key–driven process and merged with a weighted distance to decide ownership, enabling verification even when models are fine-tuned, quantized, or RLHF-aligned. Experiments on multiple protected models and unseen variants show high IP-ROC scores (often >0.95) and strong generalization, demonstrating DuFFin’s practical potential for LLM IP protection.

Abstract

Large language models (LLMs) are considered valuable Intellectual Properties (IP) for legitimate owners due to the enormous computational cost of training. It is crucial to protect the IP of LLMs from malicious stealing or unauthorized deployment. Despite existing efforts in watermarking and fingerprinting LLMs, these methods either impact the text generation process or are limited in white-box access to the suspect model, making them impractical. Hence, we propose DuFFin, a novel al-Level gerprinting ramework for black-box setting ownership verification. DuFFin extracts the trigger pattern and the knowledge-level fingerprints to identify the source of a suspect model. We conduct experiments on a variety of models collected from the open-source website, including four popular base models as protected LLMs and their fine-tuning, quantization, and safety alignment versions, which are released by large companies, start-ups, and individual users. Results show that our method can accurately verify the copyright of the base protected LLM on their model variants, achieving the IP-ROC metric greater than 0.95. Our code is available at https://github.com/yuliangyan0807/llm-fingerprint.

Paper Structure

This paper contains 29 sections, 13 equations, 5 figures, 11 tables.

Figures (5)

  • Figure 1: Overview of the DuFFin framework. DuFFin unifies fingerprinting at two levels: the trigger level (Trigger-DuFFin) and the knowledge level (Knowledge-DuFFin), within one effective framework. Each method comprises three stages: (i) Secret key construction. (ii) Fingerprint extraction. (iii) Ownership verification. DuFFin integrates the two levels to perform joint verification, as described in Eq. (\ref{['eq:duffin']}).
  • Figure 2: IP ROC curves of ownership verification.
  • Figure 3: Visualization of Knowledge-DuFFin fingerprint similarity across various domains.
  • Figure 4: Impact of the size of the Secret Key.
  • Figure 5: Visualization of Knowledge-DuFFin Fingerprints similarities across different domains.