Table of Contents
Fetching ...

SELF: A Robust Singular Value and Eigenvalue Approach for LLM Fingerprinting

Hanxiu Zhang, Yue Zheng

TL;DR

SELF addresses IP protection for LLMs by introducing weight-based fingerprints that do not rely on inputs, thereby preventing false-claim attacks. It leverages singular values and eigenvalues of attention-weight matrices to create transformation-invariant fingerprints, and uses a SimNet to perform few-shot, augmented learning-based similarity assessment. The method demonstrates strong discrimination between related and unrelated models and is robust to quantization, pruning, and fine-tuning, with a compact fingerprint size. Practically, SELF offers a scalable, robust, and deployable framework for LLM IP forensics with low runtime overhead for ongoing verifications.

Abstract

The protection of Intellectual Property (IP) in Large Language Models (LLMs) represents a critical challenge in contemporary AI research. While fingerprinting techniques have emerged as a fundamental mechanism for detecting unauthorized model usage, existing methods -- whether behavior-based or structural -- suffer from vulnerabilities such as false claim attacks or susceptible to weight manipulations. To overcome these limitations, we propose SELF, a novel intrinsic weight-based fingerprinting scheme that eliminates dependency on input and inherently resists false claims. SELF achieves robust IP protection through two key innovations: 1) unique, scalable and transformation-invariant fingerprint extraction via singular value and eigenvalue decomposition of LLM attention weights, and 2) effective neural network-based fingerprint similarity comparison based on few-shot learning and data augmentation. Experimental results demonstrate SELF maintains high IP infringement detection accuracy while showing strong robustness against various downstream modifications, including quantization, pruning, and fine-tuning attacks. Our code is available at https://github.com/HanxiuZhang/SELF_v2.

SELF: A Robust Singular Value and Eigenvalue Approach for LLM Fingerprinting

TL;DR

SELF addresses IP protection for LLMs by introducing weight-based fingerprints that do not rely on inputs, thereby preventing false-claim attacks. It leverages singular values and eigenvalues of attention-weight matrices to create transformation-invariant fingerprints, and uses a SimNet to perform few-shot, augmented learning-based similarity assessment. The method demonstrates strong discrimination between related and unrelated models and is robust to quantization, pruning, and fine-tuning, with a compact fingerprint size. Practically, SELF offers a scalable, robust, and deployable framework for LLM IP forensics with low runtime overhead for ongoing verifications.

Abstract

The protection of Intellectual Property (IP) in Large Language Models (LLMs) represents a critical challenge in contemporary AI research. While fingerprinting techniques have emerged as a fundamental mechanism for detecting unauthorized model usage, existing methods -- whether behavior-based or structural -- suffer from vulnerabilities such as false claim attacks or susceptible to weight manipulations. To overcome these limitations, we propose SELF, a novel intrinsic weight-based fingerprinting scheme that eliminates dependency on input and inherently resists false claims. SELF achieves robust IP protection through two key innovations: 1) unique, scalable and transformation-invariant fingerprint extraction via singular value and eigenvalue decomposition of LLM attention weights, and 2) effective neural network-based fingerprint similarity comparison based on few-shot learning and data augmentation. Experimental results demonstrate SELF maintains high IP infringement detection accuracy while showing strong robustness against various downstream modifications, including quantization, pruning, and fine-tuning attacks. Our code is available at https://github.com/HanxiuZhang/SELF_v2.

Paper Structure

This paper contains 45 sections, 2 theorems, 16 equations, 6 figures, 15 tables.

Key Result

Theorem 1

Under the transformation attack described in eq:attack_form, the matrices $\hat{X}_\sigma = \hat{W}_Q \hat{W}_K^T$ and $\hat{Y}_\sigma = \hat{W}_V \hat{W}_O$ satisfies: Since permutation matrices are orthogonal, $X_\sigma$ and $\hat{X}_\sigma$, $Y_\sigma$ and $\hat{Y}_\sigma$ are orthogonally equivalent and consequently share the same singular values.

Figures (6)

  • Figure 1: IP infringement detection pipeline using SELF.
  • Figure 2: Fingerprint extraction via matrix decomposition. For a given model, we extract its fingerprint using the first $N_\mathcal{F}$ Transformer block layers. Specifically, we first compute the fingerprint of each individual layer $\mathcal{F}^i$ and then aggregate the $N_\mathcal{F}$ layer-wise fingerprints to form the overall model fingerprint $\mathcal{F}$. For each layer, we first compute the singular value invariant matrices $X_{\sigma}^i, Y_{\sigma}^i$ and the eigenvalue invariant matrices $X_{\lambda}^i, Y_{\lambda}^i$ from the attention weights $W_Q^i, W_K^i, W_V^i$, and $W_O^i$. Then, we extract the normalized singular value vector $\sigma^i_{QK}, \sigma^i_{VO}$and eigenvalue vector $\lambda^i_{QK}, \lambda^i_{VO}$, which together form the fingerprint of layer $i$.
  • Figure 3: Fingerprint similarity and PPL change of Llama2-7B under different attacks. (a) and (b) show the results under fine-tuning attacks. (c) shows the results under SliceGPT pruning. Since State-of-The-Art LSTM achieves PPL smaller than 60 on PTB dataset (merity2018regularizing), we consider a PPL of 60 or higher as "unacceptable" for larger models like transformer.
  • Figure 4: HuRef calculation process in our false claim attack
  • Figure 5: False-claim attack on HuRef: both the feature vector and the human-readable fingerprint of the unrelated model (Qwen1.5-7B) become highly similar to that of the target model (Llama2-7B).
  • ...and 1 more figures

Theorems & Definitions (2)

  • Theorem 1: Singular Value Invariance
  • Theorem 2: Eigenvalue Invariance