Table of Contents
Fetching ...

Scalable Fingerprinting of Large Language Models

Anshul Nasery, Jonathan Hayase, Creston Brooks, Peiyao Sheng, Himanshu Tyagi, Pramod Viswanath, Sewoong Oh

TL;DR

This work reframes model fingerprinting around Scalability, introducing Perinucleus sampling to embed a large number of fingerprints (up to $M=24{,}576$) into LLMs with minimal utility loss. Fingerprints are generated from in-distribution keys and responses drawn via Perinucleus sampling (threshold $t=0.8$, width $k=3$), and are stabilized through a regularized training regime that combines Weight Deviation Penalty and Data-Mixing. The approach demonstrates strong persistence after post-training and generalizes across multiple model families, while providing a provable defense against collusion attacks: with $N$ models and maximum coalition size $K$, $M = O(2^K K^{K+1} \log(N/\delta))$ fingerprints can ensure detection of at least one colluder with high probability. Together, these contributions enable secure, scalable model sharing in open ecosystems and highlight practical trade-offs between scalability, uniqueness, and harmlessness in fingerprint design.

Abstract

Model fingerprinting has emerged as a powerful tool for model owners to identify their shared model given API access. However, to lower false discovery rate, fight fingerprint leakage, and defend against coalitions of model users attempting to bypass detection, we argue that {\em scalability} is critical, i.e., scaling up the number of fingerprints one can embed into a model. Hence, we pose scalability as a crucial requirement for fingerprinting schemes. We experiment with fingerprint design at a scale significantly larger than previously considered, and introduce a new method, dubbed Perinucleus sampling, to generate scalable, persistent, and harmless fingerprints. We demonstrate that this scheme can add 24,576 fingerprints to a Llama-3.1-8B model -- two orders of magnitude more than existing schemes -- without degrading the model's utility. Our inserted fingerprints persist even after supervised fine-tuning on standard post-training data. We further address security risks for fingerprinting, and theoretically and empirically show how a scalable fingerprinting scheme like ours can mitigate these risks. Our code is available at https://github.com/SewoongLab/scalable-fingerprinting-of-llms

Scalable Fingerprinting of Large Language Models

TL;DR

This work reframes model fingerprinting around Scalability, introducing Perinucleus sampling to embed a large number of fingerprints (up to ) into LLMs with minimal utility loss. Fingerprints are generated from in-distribution keys and responses drawn via Perinucleus sampling (threshold , width ), and are stabilized through a regularized training regime that combines Weight Deviation Penalty and Data-Mixing. The approach demonstrates strong persistence after post-training and generalizes across multiple model families, while providing a provable defense against collusion attacks: with models and maximum coalition size , fingerprints can ensure detection of at least one colluder with high probability. Together, these contributions enable secure, scalable model sharing in open ecosystems and highlight practical trade-offs between scalability, uniqueness, and harmlessness in fingerprint design.

Abstract

Model fingerprinting has emerged as a powerful tool for model owners to identify their shared model given API access. However, to lower false discovery rate, fight fingerprint leakage, and defend against coalitions of model users attempting to bypass detection, we argue that {\em scalability} is critical, i.e., scaling up the number of fingerprints one can embed into a model. Hence, we pose scalability as a crucial requirement for fingerprinting schemes. We experiment with fingerprint design at a scale significantly larger than previously considered, and introduce a new method, dubbed Perinucleus sampling, to generate scalable, persistent, and harmless fingerprints. We demonstrate that this scheme can add 24,576 fingerprints to a Llama-3.1-8B model -- two orders of magnitude more than existing schemes -- without degrading the model's utility. Our inserted fingerprints persist even after supervised fine-tuning on standard post-training data. We further address security risks for fingerprinting, and theoretically and empirically show how a scalable fingerprinting scheme like ours can mitigate these risks. Our code is available at https://github.com/SewoongLab/scalable-fingerprinting-of-llms

Paper Structure

This paper contains 47 sections, 2 theorems, 11 equations, 16 figures, 6 tables, 2 algorithms.

Key Result

Proposition 3.1

Given a choice of $k$ in Perinucleus sampling and $M$ distinct fingerprint queries, if we claim ownership of a model when model responses to more than $m$ fingerprint keys match the fingerprint responses for some $m$, then the false positive rate (FPR) satisfies

Figures (16)

  • Figure 1: An overview of model fingerprinting. We use the LLM to generate fingerprints with relatively low conditional probability for the response using our Perinucleus sampling scheme (Sec \ref{['sec:perinucleus']}), generating responses which are sensible, but uncommon. We insert fingerprints by fine-tuning the model with regularizers to preserve performance (Sec \ref{['sec:regularization']}). At inference time, we aim to detect the fingerprints on a potentially modified model hosted by (a coalition of) adversaries (Sec \ref{['sec:security']}).
  • Figure 2: Fingerprint Design -- (Left) We plot the avg OpenLLM open-llm-leaderboard scores (a standard benchmark) of Llama-3.1-8B models (fingerprinted with 1024 keys and a randomly chosen response for each key) against the average log perplexity of the fingerprint keys. Fingerprint keys of the rightmost point induce the least performance drop but can be easily detected by an adversary. We propose using the leftmost point, generated with low temperature. (Center) Model performance using responses from Perinucleus sampling with fixed width, $k=3$, and low-perplexity keys. We vary the threshold, $t$ (changing the conditional probability of responses). Performance sharply drops for $t>0.9$ as pairing keys with unlikely responses causes significant distortion to the fingerprinted model. (Right) Fixing $t=0.8$ and varying the width $k$ for Perinucleus fingerprint responses, we find that scores remain flat for values of $k \leq 10$ before dropping sharply for larger $k$ as the response becomes more random.
  • Figure 3: Harmlessness and Persistence of Fingerprints on Llama-3.1-8B. (Left) We insert up to 24576 fingerprints into a Llama-3.1-8B model and measure the utility (on OpenLLM) of this model. Perinucleus fingerprints lead to a lower loss in utility for the same number of fingerprints added, compared to the baseline of ENGLISH-RANDOM from xu2024instructionalfingerprintinglargelanguagerussinovich2024heythatsmodelintroducing.(Right) Persistence of the fingerprints (i.e. the percentage of fingerprints which are correctly recalled after SFT) is higher for Perinucleus fingerprints compared to the baselines of RANDOM and ENGLISH-RANDOM from xu2024instructionalfingerprintinglargelanguagerussinovich2024heythatsmodelintroducing.
  • Figure 4: Performance across models. We plot the avg scores of models fingerprinted with our scheme on OpenLLM for different sized Llama 3.1 models (left) and for base (middle) and instruction-tuned (right) models from other families. We find that the relative performance is over 95% even at 8192 fingerprints across models. The x-axis is logarithmic. See \ref{['fig:other-model-detailed']} for comparison to baselines.
  • Figure 5: Effect of number of samples, epochs and dataset for fine-tuning on persistence for Llama-3.1-8B: (Left) Persistence decreases roughly log-linearly with number of SFT samples. (Middle) Persistence decreases slightly before stabilizing with increasing number of SFT epochs. (Right) Persistence is also affected by the distribution of the SFT data, with chat like data having a higher effect than Math data. Finally, additional DPO after instruction tuning does not lead to many more fingerprints being forgotten. These trends are consistent for 1024 and 4096 fingerprints.
  • ...and 11 more figures

Theorems & Definitions (5)

  • Proposition 3.1
  • Definition 5.1: Collusion resistant fingerprinting
  • Proposition 5.3
  • proof : Proof of \ref{['prop:fingerprint-guarantee']}
  • proof : Proof of \ref{['prop:fpr']}