Table of Contents
Fetching ...

Inner-Probe: Discovering Copyright-related Data Generation in LLM Architecture

Qichao Ma, Rui-Jie Zhu, Peiye Liu, Renye Yan, Fahong Zhang, Ling Liang, Meng Li, Zhaofei Yu, Zongwei Wang, Yimao Cai, Tiejun Huang

TL;DR

This paper addresses the challenge of identifying how copyrighted data in training sets influence LLM outputs. It introduces Inner-Probe, a lightweight framework that leverages multi-head attention signals and an LSTM-based extractor to attribute sub-dataset contributions and to filter non-copyright content via a contrastive learning module. The method achieves high attribution accuracy (often >95%) and strong non-copyright filtering performance (AUC up to 0.954) across multiple models and datasets, while remaining substantially more efficient than prior text- or prompt-based approaches. The work demonstrates practical applicability through real-world case studies (Books3) and extensive experiments, and outlines extensions to larger, multilingual datasets and multimodal models for broader copyright protection in deployment scenarios.

Abstract

Large Language Models (LLMs) utilize extensive knowledge databases and show powerful text generation ability. However, their reliance on high-quality copyrighted datasets raises concerns about copyright infringements in generated texts. Current research often employs prompt engineering or semantic classifiers to identify copyrighted content, but these approaches have two significant limitations: (1) Challenging to identify which specific subdataset (e.g., works from particular authors) influences an LLM's output. (2) Treating the entire training database as copyrighted, hence overlooking the inclusion of non-copyrighted training data. We propose Inner-Probe, a lightweight framework designed to evaluate the influence of copyrighted sub-datasets on LLM-generated texts. Unlike traditional methods relying solely on text, we discover that the results of multi-head attention (MHA) during LLM output generation provide more effective information. Thus, Inner-Probe performs sub-dataset contribution analysis using a lightweight LSTM based network trained on MHA results in a supervised manner. Harnessing such a prior, Inner-Probe enables non-copyrighted text detection through a concatenated global projector trained with unsupervised contrastive learning. Inner-Probe demonstrates 3x improved efficiency compared to semantic model training in sub-dataset contribution analysis on Books3, achieves 15.04% - 58.7% higher accuracy over baselines on the Pile, and delivers a 0.104 increase in AUC for non-copyrighted data filtering.

Inner-Probe: Discovering Copyright-related Data Generation in LLM Architecture

TL;DR

This paper addresses the challenge of identifying how copyrighted data in training sets influence LLM outputs. It introduces Inner-Probe, a lightweight framework that leverages multi-head attention signals and an LSTM-based extractor to attribute sub-dataset contributions and to filter non-copyright content via a contrastive learning module. The method achieves high attribution accuracy (often >95%) and strong non-copyright filtering performance (AUC up to 0.954) across multiple models and datasets, while remaining substantially more efficient than prior text- or prompt-based approaches. The work demonstrates practical applicability through real-world case studies (Books3) and extensive experiments, and outlines extensions to larger, multilingual datasets and multimodal models for broader copyright protection in deployment scenarios.

Abstract

Large Language Models (LLMs) utilize extensive knowledge databases and show powerful text generation ability. However, their reliance on high-quality copyrighted datasets raises concerns about copyright infringements in generated texts. Current research often employs prompt engineering or semantic classifiers to identify copyrighted content, but these approaches have two significant limitations: (1) Challenging to identify which specific subdataset (e.g., works from particular authors) influences an LLM's output. (2) Treating the entire training database as copyrighted, hence overlooking the inclusion of non-copyrighted training data. We propose Inner-Probe, a lightweight framework designed to evaluate the influence of copyrighted sub-datasets on LLM-generated texts. Unlike traditional methods relying solely on text, we discover that the results of multi-head attention (MHA) during LLM output generation provide more effective information. Thus, Inner-Probe performs sub-dataset contribution analysis using a lightweight LSTM based network trained on MHA results in a supervised manner. Harnessing such a prior, Inner-Probe enables non-copyrighted text detection through a concatenated global projector trained with unsupervised contrastive learning. Inner-Probe demonstrates 3x improved efficiency compared to semantic model training in sub-dataset contribution analysis on Books3, achieves 15.04% - 58.7% higher accuracy over baselines on the Pile, and delivers a 0.104 increase in AUC for non-copyrighted data filtering.
Paper Structure (33 sections, 16 equations, 12 figures, 9 tables, 1 algorithm)

This paper contains 33 sections, 16 equations, 12 figures, 9 tables, 1 algorithm.

Figures (12)

  • Figure 1: (a) LLM-based community and commercial behaviors; (b) both copyright detection and dataset contribution analysis are key to supporting 'payment for data' in this transactional framework.
  • Figure 2: The range of data used in LLM training and inference. Only copyrighted sub-datasets used in LLM training are considered for contributing to LLM-generated texts. External copyrighted data can be pre-filtered during inference.
  • Figure 3: Visualization of the statistical differences (UMAP) in hidden states (MHA, FFN) (a) with 8-class input texts from data in the Pile; (b) with input texts that are generated from the pile gao2020pile. For each subfigure in both (a) and (b), Top Row: MHA/FFN output visualization from an LLM (BERT) not trained on the Pile. Bottom Row: MHA/FFN output visualization from an LLM (GPT-series) trained on the Pile across layers.
  • Figure 4: Comparison of a causal graph learned from MHA and FFN. The text is ‘The blue whale is the largest animal on the planet.' The graph learns causality comprehensively from MHA, but far less from FFN, (a) MHA result; (b) FFN result.
  • Figure 5: Comparison of inverse covariance matrix from MHA and FFN. The complete independence (diagonal) blocks FFN from modeling causality, (a) MHA result; (b) FFN result.
  • ...and 7 more figures