Table of Contents
Fetching ...

AttriBoT: A Bag of Tricks for Efficiently Approximating Leave-One-Out Context Attribution

Fengyuan Liu, Nikhil Kandpal, Colin Raffel

TL;DR

AttriBoT addresses the prohibitive cost of Leave-One-Out context attribution in large language models by introducing a Bag of Tricks that includes Key-Value caching, hierarchical attribution, and proxy-model techniques. These methods collectively reduce the computational burden while preserving faithfulness to the target model's LOO attributions, achieving a practical >$300\times$ speedup and making attribution roughly $30\times$ faster than generating the response in OBQA settings. The approach demonstrates Pareto-optimal efficiency across multiple datasets and model families, with strong correlations to full LOO and alignment with human-annotated important spans (e.g., HotpotQA). The work provides a flexible, composable framework and an open-source implementation that enables scalable interpretability of LLMs and supports future efficiency-driven attribution research.

Abstract

The influence of contextual input on the behavior of large language models (LLMs) has prompted the development of context attribution methods that aim to quantify each context span's effect on an LLM's generations. The leave-one-out (LOO) error, which measures the change in the likelihood of the LLM's response when a given span of the context is removed, provides a principled way to perform context attribution, but can be prohibitively expensive to compute for large models. In this work, we introduce AttriBoT, a series of novel techniques for efficiently computing an approximation of the LOO error for context attribution. Specifically, AttriBoT uses cached activations to avoid redundant operations, performs hierarchical attribution to reduce computation, and emulates the behavior of large target models with smaller proxy models. Taken together, AttriBoT can provide a >300x speedup while remaining more faithful to a target model's LOO error than prior context attribution methods. This stark increase in performance makes computing context attributions for a given response 30x faster than generating the response itself, empowering real-world applications that require computing attributions at scale. We release a user-friendly and efficient implementation of AttriBoT to enable efficient LLM interpretability as well as encourage future development of efficient context attribution methods.

AttriBoT: A Bag of Tricks for Efficiently Approximating Leave-One-Out Context Attribution

TL;DR

AttriBoT addresses the prohibitive cost of Leave-One-Out context attribution in large language models by introducing a Bag of Tricks that includes Key-Value caching, hierarchical attribution, and proxy-model techniques. These methods collectively reduce the computational burden while preserving faithfulness to the target model's LOO attributions, achieving a practical > speedup and making attribution roughly faster than generating the response in OBQA settings. The approach demonstrates Pareto-optimal efficiency across multiple datasets and model families, with strong correlations to full LOO and alignment with human-annotated important spans (e.g., HotpotQA). The work provides a flexible, composable framework and an open-source implementation that enables scalable interpretability of LLMs and supports future efficiency-driven attribution research.

Abstract

The influence of contextual input on the behavior of large language models (LLMs) has prompted the development of context attribution methods that aim to quantify each context span's effect on an LLM's generations. The leave-one-out (LOO) error, which measures the change in the likelihood of the LLM's response when a given span of the context is removed, provides a principled way to perform context attribution, but can be prohibitively expensive to compute for large models. In this work, we introduce AttriBoT, a series of novel techniques for efficiently computing an approximation of the LOO error for context attribution. Specifically, AttriBoT uses cached activations to avoid redundant operations, performs hierarchical attribution to reduce computation, and emulates the behavior of large target models with smaller proxy models. Taken together, AttriBoT can provide a >300x speedup while remaining more faithful to a target model's LOO error than prior context attribution methods. This stark increase in performance makes computing context attributions for a given response 30x faster than generating the response itself, empowering real-world applications that require computing attributions at scale. We release a user-friendly and efficient implementation of AttriBoT to enable efficient LLM interpretability as well as encourage future development of efficient context attribution methods.

Paper Structure

This paper contains 58 sections, 9 equations, 11 figures, 2 tables.

Figures (11)

  • Figure 1: We empirically test the assumptions underlying the AttriBoT's underlying methods on examples from Hotpot QA. Left: The attribution scores of small proxy models ranging from 1B to 8B parameters have high correlation with the attribution scores of a 70B-parameter target model, implying that the attributions from smaller models can be a reliable proxy for those from a target model. Middle: Paragraph-level attribution scores correlate extremely well ($R = 0.97$) with the sum of the sentence-level attribution scores in a given paragraph, suggesting that hierarchical attribution can provide an effective means of pruning a large amount of irrelevant context. Right: Proxy models can effectively prune contexts of unnecessary sources, achieving recall of $90\%$ when keeping only half of the sources in a context.
  • Figure 2: We plot the mean average precision compared to attributions of the target model against the GPU time for AttriBoT and a variety of baselines using Llama 3.1 70B Instruct as the target model and smaller Llama instruct variants as proxy models. Across all three datasets, AttriBoT is consistently Pareto-optimal over multiple orders of magnitude.
  • Figure 3: Plot showing accuracy vs. efficiency tradeoff of proxy modeling. We find that smaller proxy models produce attributions that are less faithful to the target model's attributions.
  • Figure 4: Plot showing accuracy vs. efficiency tradeoff of hierarchical attribution with and without the use of a proxy model. Both varying the size of the proxy model and tuning the number of source groups to retain, $\beta$ effectively trades attribution faithfulness for speed.
  • Figure 5: Plot showing the accuracy vs. efficiency tradeoff of proxy model pruning. We find that varying the size of the proxy model and the fraction of sources to retain, $\alpha$ effectively trades attribution faithfulness for speed.
  • ...and 6 more figures