Reading Between the Lines: Towards Reliable Black-box LLM Fingerprinting via Zeroth-order Gradient Estimation
Shuo Shao, Yiming Li, Hongwei Yao, Yifei Chen, Yuchen Yang, Zhan Qin
TL;DR
This work formalizes that input gradients contain more information about LLM parameters than outputs, via Fisher Information Theory, and leverages this insight to build ZeroPrint, a black-box fingerprinting method that estimates Jacobians using zeroth-order gradient estimation on semantically perturbed text. By constructing a robust query set and applying ridge regression on input-output embeddings, ZeroPrint derives a distinctive model fingerprint that enables effective copyright auditing without model access. Empirical results on LeaFBench demonstrate state-of-the-art performance and resilience to adaptive attacks, with practical runtime margins suitable for real-world use. The approach offers a principled, scalable path toward reliable LLM provenance verification in scenarios where only API-level access is available.
Abstract
The substantial investment required to develop Large Language Models (LLMs) makes them valuable intellectual property, raising significant concerns about copyright protection. LLM fingerprinting has emerged as a key technique to address this, which aims to verify a model's origin by extracting an intrinsic, unique signature (a "fingerprint") and comparing it to that of a source model to identify illicit copies. However, existing black-box fingerprinting methods often fail to generate distinctive LLM fingerprints. This ineffectiveness arises because black-box methods typically rely on model outputs, which lose critical information about the model's unique parameters due to the usage of non-linear functions. To address this, we first leverage Fisher Information Theory to formally demonstrate that the gradient of the model's input is a more informative feature for fingerprinting than the output. Based on this insight, we propose ZeroPrint, a novel method that approximates these information-rich gradients in a black-box setting using zeroth-order estimation. ZeroPrint overcomes the challenge of applying this to discrete text by simulating input perturbations via semantic-preserving word substitutions. This operation allows ZeroPrint to estimate the model's Jacobian matrix as a unique fingerprint. Experiments on the standard benchmark show ZeroPrint achieves a state-of-the-art effectiveness and robustness, significantly outperforming existing black-box methods.
