Table of Contents
Fetching ...

iSeal: Encrypted Fingerprinting for Reliable LLM Ownership Verification

Zixun Xiong, Gaoyi Wu, Qingyang Yu, Mingyu Derek Ma, Lingfeng Yao, Miao Pan, Xiaojiang Du, Hao Wang

TL;DR

This work addresses the challenge of reliably verifying ownership of LLMs when the model thief has end-to-end control over inference. It introduces iSeal, an ownership-verification framework that decouples fingerprints from model weights using an external secret encoder, reinforced by diffusion and confusion properties and an error-correction mechanism, along with similarity-based verification to withstand verification-time attacks. The approach yields 100% FSR across 12 LLMs under numerous attack scenarios, while maintaining low overhead and preserving model utility. Empirical results show strong resilience to unlearning, response manipulation, quantization, and temperature-based variability, outperforming existing proactive fingerprinting methods.

Abstract

Given the high cost of large language model (LLM) training from scratch, safeguarding LLM intellectual property (IP) has become increasingly crucial. As the standard paradigm for IP ownership verification, LLM fingerprinting thus plays a vital role in addressing this challenge. Existing LLM fingerprinting methods verify ownership by extracting or injecting model-specific features. However, they overlook potential attacks during the verification process, leaving them ineffective when the model thief fully controls the LLM's inference process. In such settings, attackers may share prompt-response pairs to enable fingerprint unlearning or manipulate outputs to evade exact-match verification. We propose iSeal, the first fingerprinting method designed for reliable verification when the model thief controls the suspected LLM in an end-to-end manner. It injects unique features into both the model and an external module, reinforced by an error-correction mechanism and a similarity-based verification strategy. These components are resistant to verification-time attacks, including collusion-based fingerprint unlearning and response manipulation, backed by both theoretical analysis and empirical results. iSeal achieves 100 percent Fingerprint Success Rate (FSR) on 12 LLMs against more than 10 attacks, while baselines fail under unlearning and response manipulations.

iSeal: Encrypted Fingerprinting for Reliable LLM Ownership Verification

TL;DR

This work addresses the challenge of reliably verifying ownership of LLMs when the model thief has end-to-end control over inference. It introduces iSeal, an ownership-verification framework that decouples fingerprints from model weights using an external secret encoder, reinforced by diffusion and confusion properties and an error-correction mechanism, along with similarity-based verification to withstand verification-time attacks. The approach yields 100% FSR across 12 LLMs under numerous attack scenarios, while maintaining low overhead and preserving model utility. Empirical results show strong resilience to unlearning, response manipulation, quantization, and temperature-based variability, outperforming existing proactive fingerprinting methods.

Abstract

Given the high cost of large language model (LLM) training from scratch, safeguarding LLM intellectual property (IP) has become increasingly crucial. As the standard paradigm for IP ownership verification, LLM fingerprinting thus plays a vital role in addressing this challenge. Existing LLM fingerprinting methods verify ownership by extracting or injecting model-specific features. However, they overlook potential attacks during the verification process, leaving them ineffective when the model thief fully controls the LLM's inference process. In such settings, attackers may share prompt-response pairs to enable fingerprint unlearning or manipulate outputs to evade exact-match verification. We propose iSeal, the first fingerprinting method designed for reliable verification when the model thief controls the suspected LLM in an end-to-end manner. It injects unique features into both the model and an external module, reinforced by an error-correction mechanism and a similarity-based verification strategy. These components are resistant to verification-time attacks, including collusion-based fingerprint unlearning and response manipulation, backed by both theoretical analysis and empirical results. iSeal achieves 100 percent Fingerprint Success Rate (FSR) on 12 LLMs against more than 10 attacks, while baselines fail under unlearning and response manipulations.

Paper Structure

This paper contains 40 sections, 9 theorems, 23 equations, 11 figures, 11 tables.

Key Result

Theorem 1

Keeping the secret key $K$ unchanged, if any bit of the plaintext $x$ is changed to obtain $x'$, approximately half of the bits in the ciphertext $y$ should change. Similarly, if one bit of the ciphertext $y$ is changed, about half of the bits in the plaintext $x$ should change.

Figures (11)

  • Figure 1: Pipeline of iSeal. A secret-keyed encoder maps plaintexts to ciphertexts, and the LLM is trained to reconstruct RSC encoded targets. Ownership is verified by querying the suspect API and matching decoded outputs.
  • Figure 2: Effectiveness (%) of iSeal in reconstructing plaintexts from ciphertexts, where the x-axis indicates ciphertext indices.
  • Figure 3: Resistance of iSeal to unlearning: the result is averaged over three state-of-the-art unlearning methods.
  • Figure 4: Resistance of iSeal to manipulation attacks.
  • Figure 5: Sensitivity analysis on the threshold $\alpha$.
  • ...and 6 more figures

Theorems & Definitions (9)

  • Theorem 1: Diffusion
  • Theorem 2: Confusion
  • Corollary 1
  • Theorem 1: Diffusion
  • Lemma 1
  • Theorem 2: Confusion
  • Corollary 1
  • Theorem 3
  • Corollary 2