Leave No TRACE: Black-box Detection of Copyrighted Dataset Usage in Large Language Models via Watermarking

Jingqi Zhang; Ruibo Chen; Yingqing Yang; Peihua Mai; Heng Huang; Yan Pang

Leave No TRACE: Black-box Detection of Copyrighted Dataset Usage in Large Language Models via Watermarking

Jingqi Zhang, Ruibo Chen, Yingqing Yang, Peihua Mai, Heng Huang, Yan Pang

TL;DR

TRACE provides a practical, fully black-box method to verify copyrighted dataset usage in LLM fine-tuning by watermarking datasets with distortion-free rewrites guided by a private key and detecting the watermark via an entropy-gated analysis of model outputs. The approach yields statistically significant evidence across diverse datasets and model families, enabling multi-dataset attribution and demonstrating robustness to continued pretraining while preserving text quality and downstream performance. The method relies on a two-stage process: (i) watermarked dataset rewriting and (ii) black-box detection that concentrates on high-uncertainty token positions to amplify signal. These results offer a scalable, private-key-based mechanism for rights holders to verify data usage in commercial LLMs and establish TRACE as a practical route for dataset copyright protection in real-world deployments.

Abstract

Large Language Models (LLMs) are increasingly fine-tuned on smaller, domain-specific datasets to improve downstream performance. These datasets often contain proprietary or copyrighted material, raising the need for reliable safeguards against unauthorized use. Existing membership inference attacks (MIAs) and dataset-inference methods typically require access to internal signals such as logits, while current black-box approaches often rely on handcrafted prompts or a clean reference dataset for calibration, both of which limit practical applicability. Watermarking is a promising alternative, but prior techniques can degrade text quality or reduce task performance. We propose TRACE, a practical framework for fully black-box detection of copyrighted dataset usage in LLM fine-tuning. \texttt{TRACE} rewrites datasets with distortion-free watermarks guided by a private key, ensuring both text quality and downstream utility. At detection time, we exploit the radioactivity effect of fine-tuning on watermarked data and introduce an entropy-gated procedure that selectively scores high-uncertainty tokens, substantially amplifying detection power. Across diverse datasets and model families, TRACE consistently achieves significant detections (p<0.05), often with extremely strong statistical evidence. Furthermore, it supports multi-dataset attribution and remains robust even after continued pretraining on large non-watermarked corpora. These results establish TRACE as a practical route to reliable black-box verification of copyrighted dataset usage. We will make our code available at: https://github.com/NusIoraPrivacy/TRACE.

Leave No TRACE: Black-box Detection of Copyrighted Dataset Usage in Large Language Models via Watermarking

TL;DR

Abstract

Leave No TRACE: Black-box Detection of Copyrighted Dataset Usage in Large Language Models via Watermarking

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (3)