Table of Contents
Fetching ...

From Feature Importance to Natural Language Explanations Using LLMs with RAG

Sule Tekkesinoglu, Lars Kunze

TL;DR

The paper presents a traceable question-answering framework that grounds LLM explanations in an external knowledge repository of model outputs and feature importances. It uses subtractive counterfactual reasoning to compute feature importance and employs a RAG workflow with a system prompt that enforces social, causal, selective, and contrastive explanation attributes. The approach is demonstrated on scene understanding tasks with semantic segmentation and Places365 data, showing that the generated explanations exhibit sociability, causal reasoning, selective focus, and contrastive reasoning. This work advances human-friendly, faithful explanations for vision-based AI systems, with potential impact on safety-critical applications requiring transparent decision-making.

Abstract

As machine learning becomes increasingly integral to autonomous decision-making processes involving human interaction, the necessity of comprehending the model's outputs through conversational means increases. Most recently, foundation models are being explored for their potential as post hoc explainers, providing a pathway to elucidate the decision-making mechanisms of predictive models. In this work, we introduce traceable question-answering, leveraging an external knowledge repository to inform the responses of Large Language Models (LLMs) to user queries within a scene understanding task. This knowledge repository comprises contextual details regarding the model's output, containing high-level features, feature importance, and alternative probabilities. We employ subtractive counterfactual reasoning to compute feature importance, a method that entails analysing output variations resulting from decomposing semantic features. Furthermore, to maintain a seamless conversational flow, we integrate four key characteristics - social, causal, selective, and contrastive - drawn from social science research on human explanations into a single-shot prompt, guiding the response generation process. Our evaluation demonstrates that explanations generated by the LLMs encompassed these elements, indicating its potential to bridge the gap between complex model outputs and natural language expressions.

From Feature Importance to Natural Language Explanations Using LLMs with RAG

TL;DR

The paper presents a traceable question-answering framework that grounds LLM explanations in an external knowledge repository of model outputs and feature importances. It uses subtractive counterfactual reasoning to compute feature importance and employs a RAG workflow with a system prompt that enforces social, causal, selective, and contrastive explanation attributes. The approach is demonstrated on scene understanding tasks with semantic segmentation and Places365 data, showing that the generated explanations exhibit sociability, causal reasoning, selective focus, and contrastive reasoning. This work advances human-friendly, faithful explanations for vision-based AI systems, with potential impact on safety-critical applications requiring transparent decision-making.

Abstract

As machine learning becomes increasingly integral to autonomous decision-making processes involving human interaction, the necessity of comprehending the model's outputs through conversational means increases. Most recently, foundation models are being explored for their potential as post hoc explainers, providing a pathway to elucidate the decision-making mechanisms of predictive models. In this work, we introduce traceable question-answering, leveraging an external knowledge repository to inform the responses of Large Language Models (LLMs) to user queries within a scene understanding task. This knowledge repository comprises contextual details regarding the model's output, containing high-level features, feature importance, and alternative probabilities. We employ subtractive counterfactual reasoning to compute feature importance, a method that entails analysing output variations resulting from decomposing semantic features. Furthermore, to maintain a seamless conversational flow, we integrate four key characteristics - social, causal, selective, and contrastive - drawn from social science research on human explanations into a single-shot prompt, guiding the response generation process. Our evaluation demonstrates that explanations generated by the LLMs encompassed these elements, indicating its potential to bridge the gap between complex model outputs and natural language expressions.
Paper Structure (15 sections, 1 equation, 8 figures, 3 tables)

This paper contains 15 sections, 1 equation, 8 figures, 3 tables.

Figures (8)

  • Figure 1: Integration of post-hoc explainability approach with LLMs for traceable question-answering. The code is available at https://github.com/suletekkesinoglu/XAI_LLM_RAG
  • Figure 2: A) Semantic segmentation results by Deeplab v3+. Semantic features are leveraged in the decomposition-based explanation approach. B) The visual explanation for the scene 'parking lot' where the feature importance values below (5) are greyed out.
  • Figure 3: System prompt devised to guide the response generation process.
  • Figure 4: Dialogue generated for the 'parking lot' scenario by GPT-4.
  • Figure 5: Vader sentiment analysis results across all responses.
  • ...and 3 more figures