llmSHAP: A Principled Approach to LLM Explainability
Filip Naudot, Tobias Sundqvist, Timotheus Kampik
TL;DR
llmSHAP systematically analyzes how stochastic LLM decoding affects Shapley-value explainability and introduces deterministic variants (e.g., cache-based Shapley) that restore axiomatic guarantees while offering speedups. By formalizing setup, axiomatic implications, and complexity, the work maps clear trade-offs between fidelity to exact Shapley attributions, inference speed, and principle attainment for LLM-based decision support. Empirical results on a disease-symptom task show that cache-based attribution remains stable across feature counts, while sliding-window and counterfactual approaches trade off speed and axioms. The study provides actionable guidance and open-source tooling for practitioners designing explainable LLM systems, and points to future directions like applying attribution to internal chain-of-thought steps.
Abstract
Feature attribution methods help make machine learning-based inference explainable by determining how much one or several features have contributed to a model's output. A particularly popular attribution method is based on the Shapley value from cooperative game theory, a measure that guarantees the satisfaction of several desirable principles, assuming deterministic inference. We apply the Shapley value to feature attribution in large language model (LLM)-based decision support systems, where inference is, by design, stochastic (non-deterministic). We then demonstrate when we can and cannot guarantee Shapley value principle satisfaction across different implementation variants applied to LLM-based decision support, and analyze how the stochastic nature of LLMs affects these guarantees. We also highlight trade-offs between explainable inference speed, agreement with exact Shapley value attributions, and principle attainment.
