Table of Contents
Fetching ...

CELL your Model: Contrastive Explanations for Large Language Models

Ronny Luss, Erik Miehling, Amit Dhurandhar

TL;DR

The paper addresses the challenge of explaining LLM outputs in a black-box setting by introducing contrastive explanations that compare the original response to responses from perturbed prompts. It formalizes the problem as a constrained search using a user-defined scoring function and a similarity measure, and proposes two algorithms, m-CELL and CELL, to efficiently identify informative contrasts under a query budget. Empirical evaluations on MIC and XSum with Llama models and an infiller demonstrate that CELL-based approaches yield higher-quality contrasts and robust performance across tasks, including open-text generation and conversational explanation. The work also showcases practical application to conversations by evaluating submaxims and demonstrates how contrastive prompts can serve as training data for improved dialogue systems. Overall, this framework enables actionable, budget-conscious explanations of LLM behavior without requiring internal model access, with potential impact on model transparency and user trust in complex generative systems.

Abstract

The advent of black-box deep neural network classification models has sparked the need to explain their decisions. However, in the case of generative AI, such as large language models (LLMs), there is no class prediction to explain. Rather, one can ask why an LLM output a particular response to a given prompt. In this paper, we answer this question by proposing a contrastive explanation method requiring simply black-box/query access. Our explanations suggest that an LLM outputs a reply to a given prompt because if the prompt was slightly modified, the LLM would have given a different response that is either less preferable or contradicts the original response. The key insight is that contrastive explanations simply require a scoring function that has meaning to the user and not necessarily a specific real valued quantity (viz. class label). To this end, we offer a novel budgeted algorithm, our main algorithmic contribution, which intelligently creates contrasts based on such a scoring function while adhering to a query budget, necessary for longer contexts. We show the efficacy of our method on important natural language tasks such as open-text generation and chatbot conversations.

CELL your Model: Contrastive Explanations for Large Language Models

TL;DR

The paper addresses the challenge of explaining LLM outputs in a black-box setting by introducing contrastive explanations that compare the original response to responses from perturbed prompts. It formalizes the problem as a constrained search using a user-defined scoring function and a similarity measure, and proposes two algorithms, m-CELL and CELL, to efficiently identify informative contrasts under a query budget. Empirical evaluations on MIC and XSum with Llama models and an infiller demonstrate that CELL-based approaches yield higher-quality contrasts and robust performance across tasks, including open-text generation and conversational explanation. The work also showcases practical application to conversations by evaluating submaxims and demonstrates how contrastive prompts can serve as training data for improved dialogue systems. Overall, this framework enables actionable, budget-conscious explanations of LLM behavior without requiring internal model access, with potential impact on model transparency and user trust in complex generative systems.

Abstract

The advent of black-box deep neural network classification models has sparked the need to explain their decisions. However, in the case of generative AI, such as large language models (LLMs), there is no class prediction to explain. Rather, one can ask why an LLM output a particular response to a given prompt. In this paper, we answer this question by proposing a contrastive explanation method requiring simply black-box/query access. Our explanations suggest that an LLM outputs a reply to a given prompt because if the prompt was slightly modified, the LLM would have given a different response that is either less preferable or contradicts the original response. The key insight is that contrastive explanations simply require a scoring function that has meaning to the user and not necessarily a specific real valued quantity (viz. class label). To this end, we offer a novel budgeted algorithm, our main algorithmic contribution, which intelligently creates contrasts based on such a scoring function while adhering to a query budget, necessary for longer contexts. We show the efficacy of our method on important natural language tasks such as open-text generation and chatbot conversations.
Paper Structure (19 sections, 1 equation, 10 figures, 6 tables, 5 algorithms)

This paper contains 19 sections, 1 equation, 10 figures, 6 tables, 5 algorithms.

Figures (10)

  • Figure 1: Contrastive explanations for natural language generation by meta-llama/Llama-3.1-8B-Instruct. Colors match what is changed between input and contrastive prompts. These explanations suggest that the input prompt generated the input response because if the highlighted changes were made, the new contrastive prompt would generate a different response which contradicts the input response. Prompts taken from the Moral Integrity Corpus micdata.
  • Figure 2: Illustration of the CELL and m-CELL algorithms. Both algorithms can be summarized as an iterative process that repeats a) Select substrings of the prompt to search, b) Generate perturbed prompts (mask & infill), c) Generate responses for each perturbed prompt (via the LLM), d) Score each perturbed prompt/response. The main difference between the budgeted method, CELL, and the myopic method, m-CELL, is in the selection block -- CELL augments the search process with a prompt seed generation step (see Algorithm \ref{['algo:cem_llm_budget_summary']} for details). CELL's search is an iterative loop subject to an inner loop budget before repeating the prompt seed generation step, whereas the myopic method's search simply enumerates over substrings.
  • Figure 3: Average # model calls for CELL and m-CELL applied to text summarization. Shaded regions denote standard error.
  • Figure 4: Example of explaining conversations. Colors match between what is changed between input and contrastive prompts (restricted to the assistant).
  • Figure 5: Prompt used for Prometheus2.
  • ...and 5 more figures