Scalable Fingerprinting of Large Language Models
Anshul Nasery, Jonathan Hayase, Creston Brooks, Peiyao Sheng, Himanshu Tyagi, Pramod Viswanath, Sewoong Oh
TL;DR
This work reframes model fingerprinting around Scalability, introducing Perinucleus sampling to embed a large number of fingerprints (up to $M=24{,}576$) into LLMs with minimal utility loss. Fingerprints are generated from in-distribution keys and responses drawn via Perinucleus sampling (threshold $t=0.8$, width $k=3$), and are stabilized through a regularized training regime that combines Weight Deviation Penalty and Data-Mixing. The approach demonstrates strong persistence after post-training and generalizes across multiple model families, while providing a provable defense against collusion attacks: with $N$ models and maximum coalition size $K$, $M = O(2^K K^{K+1} \log(N/\delta))$ fingerprints can ensure detection of at least one colluder with high probability. Together, these contributions enable secure, scalable model sharing in open ecosystems and highlight practical trade-offs between scalability, uniqueness, and harmlessness in fingerprint design.
Abstract
Model fingerprinting has emerged as a powerful tool for model owners to identify their shared model given API access. However, to lower false discovery rate, fight fingerprint leakage, and defend against coalitions of model users attempting to bypass detection, we argue that {\em scalability} is critical, i.e., scaling up the number of fingerprints one can embed into a model. Hence, we pose scalability as a crucial requirement for fingerprinting schemes. We experiment with fingerprint design at a scale significantly larger than previously considered, and introduce a new method, dubbed Perinucleus sampling, to generate scalable, persistent, and harmless fingerprints. We demonstrate that this scheme can add 24,576 fingerprints to a Llama-3.1-8B model -- two orders of magnitude more than existing schemes -- without degrading the model's utility. Our inserted fingerprints persist even after supervised fine-tuning on standard post-training data. We further address security risks for fingerprinting, and theoretically and empirically show how a scalable fingerprinting scheme like ours can mitigate these risks. Our code is available at https://github.com/SewoongLab/scalable-fingerprinting-of-llms
