Table of Contents
Fetching ...

Using LLMs for Explaining Sets of Counterfactual Examples to Final Users

Arturo Fredes, Jordi Vitria

TL;DR

The paper tackles explainability for tabular-data classifiers by turning sets of counterfactuals into natural-language explanations using large language models. It introduces a four-step pipeline—counterfactual generation with DiCE, abductive cause extraction via LLMs, cause evaluation and ranking through code-assisted checks, and final explanation synthesis—to produce user-friendly guidance on how to flip outcomes. Closed-loop evaluation and diverse prompting approaches (including Tree of Thought) are used to assess validity, cause coverage, and explanation quality on the Adult dataset, showing promising results while acknowledging limitations and the need for broader human evaluation. Overall, the work demonstrates a viable pathway to transform multiple counterfactuals into actionable, interpretable guidance for end users, with potential for refinement across datasets and evaluation methodologies.

Abstract

Causality is vital for understanding true cause-and-effect relationships between variables within predictive models, rather than relying on mere correlations, making it highly relevant in the field of Explainable AI. In an automated decision-making scenario, causal inference methods can analyze the underlying data-generation process, enabling explanations of a model's decision by manipulating features and creating counterfactual examples. These counterfactuals explore hypothetical scenarios where a minimal number of factors are altered, providing end-users with valuable information on how to change their situation. However, interpreting a set of multiple counterfactuals can be challenging for end-users who are not used to analyzing raw data records. In our work, we propose a novel multi-step pipeline that uses counterfactuals to generate natural language explanations of actions that will lead to a change in outcome in classifiers of tabular data using LLMs. This pipeline is designed to guide the LLM through smaller tasks that mimic human reasoning when explaining a decision based on counterfactual cases. We conducted various experiments using a public dataset and proposed a method of closed-loop evaluation to assess the coherence of the final explanation with the counterfactuals, as well as the quality of the content. Results are promising, although further experiments with other datasets and human evaluations should be carried out.

Using LLMs for Explaining Sets of Counterfactual Examples to Final Users

TL;DR

The paper tackles explainability for tabular-data classifiers by turning sets of counterfactuals into natural-language explanations using large language models. It introduces a four-step pipeline—counterfactual generation with DiCE, abductive cause extraction via LLMs, cause evaluation and ranking through code-assisted checks, and final explanation synthesis—to produce user-friendly guidance on how to flip outcomes. Closed-loop evaluation and diverse prompting approaches (including Tree of Thought) are used to assess validity, cause coverage, and explanation quality on the Adult dataset, showing promising results while acknowledging limitations and the need for broader human evaluation. Overall, the work demonstrates a viable pathway to transform multiple counterfactuals into actionable, interpretable guidance for end users, with potential for refinement across datasets and evaluation methodologies.

Abstract

Causality is vital for understanding true cause-and-effect relationships between variables within predictive models, rather than relying on mere correlations, making it highly relevant in the field of Explainable AI. In an automated decision-making scenario, causal inference methods can analyze the underlying data-generation process, enabling explanations of a model's decision by manipulating features and creating counterfactual examples. These counterfactuals explore hypothetical scenarios where a minimal number of factors are altered, providing end-users with valuable information on how to change their situation. However, interpreting a set of multiple counterfactuals can be challenging for end-users who are not used to analyzing raw data records. In our work, we propose a novel multi-step pipeline that uses counterfactuals to generate natural language explanations of actions that will lead to a change in outcome in classifiers of tabular data using LLMs. This pipeline is designed to guide the LLM through smaller tasks that mimic human reasoning when explaining a decision based on counterfactual cases. We conducted various experiments using a public dataset and proposed a method of closed-loop evaluation to assess the coherence of the final explanation with the counterfactuals, as well as the quality of the content. Results are promising, although further experiments with other datasets and human evaluations should be carried out.
Paper Structure (16 sections, 2 figures, 9 tables)

This paper contains 16 sections, 2 figures, 9 tables.

Figures (2)

  • Figure 1: Using LLMs to generate a natural language explanation from a set of counterfactuals, which will be more easily interpreted by end users
  • Figure 2: Scheme of the different steps taken to generate the final explanation