Table of Contents
Fetching ...

Identifying the Source of Generation for Large Language Models

Bumjin Park, Jaesik Choi

TL;DR

This work addresses the problem of tracing the source documents underlying text generated by large language models (LLMs) by introducing a token-level, post-hoc source identification mechanism. It formalizes the task as a multi-label classification and proposes an $n$-gram Source Identifier implemented as a non-linear MLP that maps internal LLM activations to document labels, evaluated across multiple models and datasets. The study demonstrates that bigram representations offer the best generalization, with performance improving with larger models and depending on the layer used, while non-linear predictors generally outperform linear ones. The proposed approach enables provenance tagging and reliability for generated content without altering the LLMs, bearing significant implications for safety, copyright protection, and trust in AI systems, particularly for providers who can supply document-source labels for end users.

Abstract

Large language models (LLMs) memorize text from several sources of documents. In pretraining, LLM trains to maximize the likelihood of text but neither receives the source of the text nor memorizes the source. Accordingly, LLM can not provide document information on the generated content, and users do not obtain any hint of reliability, which is crucial for factuality or privacy infringement. This work introduces token-level source identification in the decoding step, which maps the token representation to the reference document. We propose a bi-gram source identifier, a multi-layer perceptron with two successive token representations as input for better generalization. We conduct extensive experiments on Wikipedia and PG19 datasets with several LLMs, layer locations, and identifier sizes. The overall results show a possibility of token-level source identifiers for tracing the document, a crucial problem for the safe use of LLMs.

Identifying the Source of Generation for Large Language Models

TL;DR

This work addresses the problem of tracing the source documents underlying text generated by large language models (LLMs) by introducing a token-level, post-hoc source identification mechanism. It formalizes the task as a multi-label classification and proposes an -gram Source Identifier implemented as a non-linear MLP that maps internal LLM activations to document labels, evaluated across multiple models and datasets. The study demonstrates that bigram representations offer the best generalization, with performance improving with larger models and depending on the layer used, while non-linear predictors generally outperform linear ones. The proposed approach enables provenance tagging and reliability for generated content without altering the LLMs, bearing significant implications for safety, copyright protection, and trust in AI systems, particularly for providers who can supply document-source labels for end users.

Abstract

Large language models (LLMs) memorize text from several sources of documents. In pretraining, LLM trains to maximize the likelihood of text but neither receives the source of the text nor memorizes the source. Accordingly, LLM can not provide document information on the generated content, and users do not obtain any hint of reliability, which is crucial for factuality or privacy infringement. This work introduces token-level source identification in the decoding step, which maps the token representation to the reference document. We propose a bi-gram source identifier, a multi-layer perceptron with two successive token representations as input for better generalization. We conduct extensive experiments on Wikipedia and PG19 datasets with several LLMs, layer locations, and identifier sizes. The overall results show a possibility of token-level source identifiers for tracing the document, a crucial problem for the safe use of LLMs.
Paper Structure (22 sections, 1 equation, 8 figures, 6 tables, 1 algorithm)

This paper contains 22 sections, 1 equation, 8 figures, 6 tables, 1 algorithm.

Figures (8)

  • Figure 1: An illustration of the source identification. In step (1), the GPT model is trained with language modeling to memorize documents. In step (2), the source identifier is trained to predict the documents while the GPT is frozen. Note that the pretraining data is a collection of multiple documents with known sources.
  • Figure 2: Dataset construction and $n$-gram source identifier. The current prediction location is train, and the source identifier uses $n$-gram representations as inputs.
  • Figure 3: Training accuracy of three model types and sizes. The first row is the accuracy for train, and the second is for test-in. Larger models show better generalization performance. In the case of Llama2, 7B size shows higher train accuracy than 13B but shows better generalization with 13B.
  • Figure 4: Accuracy of train and test-in splits. Although train dataset shows almost the same accuracy over layers, the generalization differs in layers for test-in.
  • Figure 5: Accuracy of train and test-in splits. Each star represents a single MLP. All $n$-gram shows the same train performance, but the bigram generalizes better.
  • ...and 3 more figures