TRACE: TRansformer-based Attribution using Contrastive Embeddings in LLMs

Cheng Wang; Xinyang Lu; See-Kiong Ng; Bryan Kian Hsiang Low

TRACE: TRansformer-based Attribution using Contrastive Embeddings in LLMs

Cheng Wang, Xinyang Lu, See-Kiong Ng, Bryan Kian Hsiang Low

TL;DR

TRACE tackles the critical need for reliable source attribution for LLM outputs by introducing a transformer-based framework that learns source-aware, contrastive embeddings. It combines source-specific semantic distillation with supervised contrastive training on SBERT-based sentence representations and employs proximity-based inference (Hard kNN, Soft kNN, Nearest Centroid) to attribute responses to data providers. The approach yields high attribution accuracy, scales to many providers, and offers interpretability through evidence like nearest principal sentences and similarity scores, while also evaluating robustness to adversarial text distortions. Practically, TRACE provides a model-agnostic, efficient solution for auditing LLM outputs under regulatory and privacy considerations, with clear avenues for improvement in paraphrase robustness and data-sharing realism.

Abstract

The rapid evolution of large language models (LLMs) represents a substantial leap forward in natural language understanding and generation. However, alongside these advancements come significant challenges related to the accountability and transparency of LLM responses. Reliable source attribution is essential to adhering to stringent legal and regulatory standards, including those set forth by the General Data Protection Regulation. Despite the well-established methods in source attribution within the computer vision domain, the application of robust attribution frameworks to natural language processing remains underexplored. To bridge this gap, we propose a novel and versatile TRansformer-based Attribution framework using Contrastive Embeddings called TRACE that, in particular, exploits contrastive learning for source attribution. We perform an extensive empirical evaluation to demonstrate the performance and efficiency of TRACE in various settings and show that TRACE significantly improves the ability to attribute sources accurately, making it a valuable tool for enhancing the reliability and trustworthiness of LLMs.

TRACE: TRansformer-based Attribution using Contrastive Embeddings in LLMs

TL;DR

Abstract

Paper Structure (35 sections, 4 equations, 4 figures, 6 tables)

This paper contains 35 sections, 4 equations, 4 figures, 6 tables.

Introduction
Preliminaries
Contrastive Learning and NT-Xent Loss.
Sentence Encoder.
TRACE Framework
Source-Specific Semantic Distillation
Supervised Contrastive Embedding Training for Source-Coherent Clustering
Proximity-based Inference
Hard $k$NN (Single-Source Attribution).
Soft $k$NN (Multi-Source Attribution).
Nearest Centroid (Single-Source Attribution).
Experiments
Experimental Setup
Data.
Model.
...and 20 more sections

Figures (4)

Figure 1: Illustration of TRACE framework.
Figure 2: Illustration of the attribution step in TRACE framework.
Figure 3: Visualization (using UMAP) of the embedding space before (left) and after (right) contrastive learning.
Figure 4: Contrastive loss (left) and soft $k$NN accuracy (right) with different WINDOW_SIZEs. Note that the results for hard $k$NN (regardless of the value of $k$) are identical to that of soft $k$NN when $k=1$.

TRACE: TRansformer-based Attribution using Contrastive Embeddings in LLMs

TL;DR

Abstract

TRACE: TRansformer-based Attribution using Contrastive Embeddings in LLMs

Authors

TL;DR

Abstract

Table of Contents

Figures (4)