LLM Attributor: Interactive Visual Attribution for LLM Generation

Seongmin Lee; Zijie J. Wang; Aishwarya Chakravarthy; Alec Helbling; ShengYun Peng; Mansi Phute; Duen Horng Chau; Minsuk Kahng

LLM Attributor: Interactive Visual Attribution for LLM Generation

Seongmin Lee, Zijie J. Wang, Aishwarya Chakravarthy, Alec Helbling, ShengYun Peng, Mansi Phute, Duen Horng Chau, Minsuk Kahng

TL;DR

This paper introduces LLM Attributor, a Python library that visualizes training-data attribution for LLM-generated text within computational notebooks to illuminate why a model produced particular outputs. Built around an enhanced DataInf-style attribution score, it caches gradients and aggregates over multiple shuffled checkpoints to stabilize estimates, and provides two interactive views (Main View and Comparison View) for token-level inspection and text-for-text comparison. The tool emphasizes practical usability, notebook compatibility, and extensibility to additional TDA methods, with open-source availability to support transparent model debugging and data provenance analysis. By enabling developers to identify influential training data points and compare model outputs against user-provided text, LLM Attributor aims to improve trustworthiness and facilitate responsible AI deployment in real-world tasks, including disaster response and finance education.

Abstract

While large language models (LLMs) have shown remarkable capability to generate convincing text across diverse domains, concerns around its potential risks have highlighted the importance of understanding the rationale behind text generation. We present LLM Attributor, a Python library that provides interactive visualizations for training data attribution of an LLM's text generation. Our library offers a new way to quickly attribute an LLM's text generation to training data points to inspect model behaviors, enhance its trustworthiness, and compare model-generated text with user-provided text. We describe the visual and interactive design of our tool and highlight usage scenarios for LLaMA2 models fine-tuned with two different datasets: online articles about recent disasters and finance-related question-answer pairs. Thanks to LLM Attributor's broad support for computational notebooks, users can easily integrate it into their workflow to interactively visualize attributions of their models. For easier access and extensibility, we open-source LLM Attributor at https://github.com/poloclub/ LLM-Attribution. The video demo is available at https://youtu.be/mIG2MDQKQxM.

LLM Attributor: Interactive Visual Attribution for LLM Generation

TL;DR

Abstract

Paper Structure (13 sections, 3 figures)

This paper contains 13 sections, 3 figures.

Introduction
Related Work
Training Data Attribution
Visualization for LLM Attribution
System Design
Data Attribution Score
Main View
Comparison View
Usage Scenarios
Understand Problematic Generation
Identify Sources of Generated Text
Conclusion and Future Work
Broader Impact

Figures (3)

Figure 1: LLM Attributor enables LLM developers to visualize the training data attribution of their models in computational notebooks. In this example, our user Megan is surprised that an LLM fine-tuned on a disaster dataset occasionally generates dry weather as the cause of the 2023 Hawaii wildfires, while often yielding directed-energy weapons as in a conspiracy theory. (A) Tokens being attributed, which are interactively selected by Megan, are displayed side-by-side for visual comparison. (B) Training data points with the highest attribution scores are presented as a list by default, and can be interactively expanded to the full source text, revealing that the data point most responsible for generating directed-energy weapons is an X post that spreads the conspiracy theory. (C) Keyword Summary shows important words in the displayed training data. (D) Score Distribution over the entire training data is visualized as a histogram, enabling both high-level comparison over the entire data and low-level analysis focusing on individual data points. Below, the training data points with the lowest attribution scores are visualized in the same way.
Figure 2: Main View visualizes training data attribution for text generated by an LLM. (A) Users interactively selects tokens to attribute by running select_tokens. (B) Running the attribute function launches the Main View to visualize the most positively- and negatively-attributed training data points for the selected tokens, important words in those data points, and the distribution of attribution scores over the entire training data.
Figure 3: For easy comparative analysis, users can interactively add, delete, and edit words in LLM-generated text using the edit_text function.

LLM Attributor: Interactive Visual Attribution for LLM Generation

TL;DR

Abstract

LLM Attributor: Interactive Visual Attribution for LLM Generation

Authors

TL;DR

Abstract

Table of Contents

Figures (3)