Table of Contents
Fetching ...

CoTSRF: Utilize Chain of Thought as Stealthy and Robust Fingerprint of Large Language Models

Zhenzhen Ren, GuoBiao Li, Sheng Li, Zhenxing Qian, Xinpeng Zhang

TL;DR

CoTSRF introduces a Chain of Thought (CoT)–based fingerprint for open-source LLM infringement detection in a black-box setting. It collects CoT-rich responses with a High-Temperature Data Augmentation strategy, trains a CoT extractor via contrastive learning using a triplet-margin loss, and then verifies infringement by comparing the source and suspect CoT feature distributions through the KL divergence $\mathrm{KL}(D^S \| D^V)$ with KDE-based densities. The method achieves strong effectiveness (TPR up to 100% with $I=100$), reliability against unseen benign LLMs (FPR near 0%), stealthiness (CoT prompts maintain semantic coherence), and robustness against output perturbation and fine-tuning attacks. This approach offers a practical, non-intrusive mechanism for LLM license enforcement and infringement detection in real-world deployments, with explicit attention to stealth and resilience.

Abstract

Despite providing superior performance, open-source large language models (LLMs) are vulnerable to abusive usage. To address this issue, recent works propose LLM fingerprinting methods to identify the specific source LLMs behind suspect applications. However, these methods fail to provide stealthy and robust fingerprint verification. In this paper, we propose a novel LLM fingerprinting scheme, namely CoTSRF, which utilizes the Chain of Thought (CoT) as the fingerprint of an LLM. CoTSRF first collects the responses from the source LLM by querying it with crafted CoT queries. Then, it applies contrastive learning to train a CoT extractor that extracts the CoT feature (i.e., fingerprint) from the responses. Finally, CoTSRF conducts fingerprint verification by comparing the Kullback-Leibler divergence between the CoT features of the source and suspect LLMs against an empirical threshold. Various experiments have been conducted to demonstrate the advantage of our proposed CoTSRF for fingerprinting LLMs, particularly in stealthy and robust fingerprint verification.

CoTSRF: Utilize Chain of Thought as Stealthy and Robust Fingerprint of Large Language Models

TL;DR

CoTSRF introduces a Chain of Thought (CoT)–based fingerprint for open-source LLM infringement detection in a black-box setting. It collects CoT-rich responses with a High-Temperature Data Augmentation strategy, trains a CoT extractor via contrastive learning using a triplet-margin loss, and then verifies infringement by comparing the source and suspect CoT feature distributions through the KL divergence with KDE-based densities. The method achieves strong effectiveness (TPR up to 100% with ), reliability against unseen benign LLMs (FPR near 0%), stealthiness (CoT prompts maintain semantic coherence), and robustness against output perturbation and fine-tuning attacks. This approach offers a practical, non-intrusive mechanism for LLM license enforcement and infringement detection in real-world deployments, with explicit attention to stealth and resilience.

Abstract

Despite providing superior performance, open-source large language models (LLMs) are vulnerable to abusive usage. To address this issue, recent works propose LLM fingerprinting methods to identify the specific source LLMs behind suspect applications. However, these methods fail to provide stealthy and robust fingerprint verification. In this paper, we propose a novel LLM fingerprinting scheme, namely CoTSRF, which utilizes the Chain of Thought (CoT) as the fingerprint of an LLM. CoTSRF first collects the responses from the source LLM by querying it with crafted CoT queries. Then, it applies contrastive learning to train a CoT extractor that extracts the CoT feature (i.e., fingerprint) from the responses. Finally, CoTSRF conducts fingerprint verification by comparing the Kullback-Leibler divergence between the CoT features of the source and suspect LLMs against an empirical threshold. Various experiments have been conducted to demonstrate the advantage of our proposed CoTSRF for fingerprinting LLMs, particularly in stealthy and robust fingerprint verification.

Paper Structure

This paper contains 23 sections, 4 equations, 4 figures, 5 tables.

Figures (4)

  • Figure 1: LLM fingerprint verification process of the proposed method in a black-box access setting, where the model provider only has API access to the suspect LLM.
  • Figure 2: Framework of the proposed CoTSRF.
  • Figure 3: Distribution of the Euclidean distance between the CoT features of the source LLM and those of the infringing/benign LLM.
  • Figure 4: TPR of CoTSRF and TRAP under different temperature settings (from 0.2 to 1.8).