Source Attribution for Large Language Model-Generated Data

Jingtan Wang; Xinyang Lu; Zitong Zhao; Zhongxiang Dai; Chuan-Sheng Foo; See-Kiong Ng; Bryan Kian Hsiang Low

Source Attribution for Large Language Model-Generated Data

Jingtan Wang, Xinyang Lu, Zitong Zhao, Zhongxiang Dai, Chuan-Sheng Foo, See-Kiong Ng, Bryan Kian Hsiang Low

TL;DR

This work tackles IP concerns surrounding LLM training data by introducing a watermarking-based source attribution framework (WASA) that embeds unique watermarks for each data provider and trains an LLM to map texts to their watermarks. The WASA-LLM uses a dual-space architecture with separate embedding/prediction paths for word and watermark tokens, optimizing $L_{WASA-LLM} = L_{lm} + L_{wtm}$ to achieve accurate attribution while preserving generation quality. Empirical results on ArXiv and BookSum show high single-source attribution accuracy and strong robustness to watermark removal and various text attacks, with scalability demonstrated up to hundreds of providers and transferability across multiple LLMs. The framework supports data provenance verification and offers practical protection for data providers, albeit with acknowledged limitations and ethical considerations for deployment in real-world settings.

Abstract

The impressive performances of Large Language Models (LLMs) and their immense potential for commercialization have given rise to serious concerns over the Intellectual Property (IP) of their training data. In particular, the synthetic texts generated by LLMs may infringe the IP of the data being used to train the LLMs. To this end, it is imperative to be able to perform source attribution by identifying the data provider who contributed to the generation of a synthetic text by an LLM. In this paper, we show that this problem can be tackled by watermarking, i.e., by enabling an LLM to generate synthetic texts with embedded watermarks that contain information about their source(s). We identify the key properties of such watermarking frameworks (e.g., source attribution accuracy, robustness against adversaries), and propose a source attribution framework that satisfies these key properties due to our algorithmic designs. Our framework enables an LLM to learn an accurate mapping from the generated texts to data providers, which sets the foundation for effective source attribution. Extensive empirical evaluations show that our framework achieves effective source attribution.

Source Attribution for Large Language Model-Generated Data

TL;DR

to achieve accurate attribution while preserving generation quality. Empirical results on ArXiv and BookSum show high single-source attribution accuracy and strong robustness to watermark removal and various text attacks, with scalability demonstrated up to hundreds of providers and transferability across multiple LLMs. The framework supports data provenance verification and offers practical protection for data providers, albeit with acknowledged limitations and ethical considerations for deployment in real-world settings.

Abstract

Paper Structure (64 sections, 13 equations, 13 figures, 30 tables, 1 algorithm)

This paper contains 64 sections, 13 equations, 13 figures, 30 tables, 1 algorithm.

Introduction
Key Properties of Watermarking for Source Attribution
Watermarking for Source Attribution (WASA) Framework
Embedding Watermarks into Texts
Training WASA-LLM
Generating Texts with Embedded Watermarks using WASA-LLM
Experiments
Accuracy
Robustness
Scalability
Performance Preservation
Other Key Properties
Related Work
Conclusion
Ethical Considerations
...and 49 more sections

Figures (13)

Figure 1: Illustration of WASA's problem setting. Watermarks are embedded into the texts from data providers for training the LLM. The LLM produced by our WASA framework can generate synthetic texts with embedded watermarks that allow for effective source attribution.
Figure 2: Sentences embedded (the first one) and not embedded (the second one) with our imperceptible watermark visualized in the bottom sentence.
Figure 3: Separation of token embeddings and prediction spaces for texts and watermarks.
Figure 4: Training losses for word tokens (Loss_lm) and watermark tokens (Loss_wtm) when obtaining WASA-LLM from second-stage pre-training of the GPT2 model on ArXiv dataset.
Figure 5: Example of training samples in the SFT dataset.
...and 8 more figures

Source Attribution for Large Language Model-Generated Data

TL;DR

Abstract

Source Attribution for Large Language Model-Generated Data

Authors

TL;DR

Abstract

Table of Contents

Figures (13)