In-Context Explainers: Harnessing LLMs for Explaining Black Box Models

Nicholas Kroeger; Dan Ley; Satyapriya Krishna; Chirag Agarwal; Himabindu Lakkaraju

In-Context Explainers: Harnessing LLMs for Explaining Black Box Models

Nicholas Kroeger, Dan Ley, Satyapriya Krishna, Chirag Agarwal, Himabindu Lakkaraju

TL;DR

The paper investigates using large language models (LLMs) with in-context learning to generate post hoc explanations for other predictive models. It introduces the In-Context Explainers framework, which comprises two prompting strategies—Perturb ICL and Explain ICL—to elicit faithful natural language explanations. Through extensive experiments on real-world tabular and text datasets with multiple models and GPT variants, the study shows that LLM-derived explanations can reach faithfulness levels comparable to traditional post hoc explainers, with the Explain ICL approach mimicking several existing methods. These findings suggest a practical and scalable role for LLMs in model interpretability and open up future research directions for LLM-based explanations of complex predictors.

Abstract

Recent advancements in Large Language Models (LLMs) have demonstrated exceptional capabilities in complex tasks like machine translation, commonsense reasoning, and language understanding. One of the primary reasons for the adaptability of LLMs in such diverse tasks is their in-context learning (ICL) capability, which allows them to perform well on new tasks by simply using a few task samples in the prompt. Despite their effectiveness in enhancing the performance of LLMs on diverse language and tabular tasks, these methods have not been thoroughly explored for their potential to generate post hoc explanations. In this work, we carry out one of the first explorations to analyze the effectiveness of LLMs in explaining other complex predictive models using ICL. To this end, we propose a novel framework, In-Context Explainers, comprising of three novel approaches that exploit the ICL capabilities of LLMs to explain the predictions made by other predictive models. We conduct extensive analysis with these approaches on real-world tabular and text datasets and demonstrate that LLMs are capable of explaining other predictive models similar to state-of-the-art post hoc explainers, opening up promising avenues for future research into LLM-based post hoc explanations of complex predictive models.

In-Context Explainers: Harnessing LLMs for Explaining Black Box Models

TL;DR

Abstract

Paper Structure (13 sections, 4 equations, 20 figures, 8 tables)

This paper contains 13 sections, 4 equations, 20 figures, 8 tables.

Introduction
Related Work
Our Framework: In-Context Explainers
Perturb ICL
Explain ICL
Experimental Evaluation
Datasets and Experimental Setup
Results
Conclusion
Appendix: Additional results and Experimental details
Additional Experimental Details
Additional Results
LLM Replies

Figures (20)

Figure 1: Overview of the in-context explanation generation and evaluation process. Given a dataset and a model to explain, we introduce novel ICL strategies to generate explanations of model predictions using LLMs. The resulting LLM-based explanations are then parsed, and their faithfulness is evaluated using diverse metrics.
Figure 2: Sample serialization template for the Recidivism dataset with six features.
Figure 3: A sample prompt generated using our proposed Perturb ICL (P-ICL) prompting strategy.
Figure 4: A sample prompt generated using the Perturb+Guide ICL (PG-ICL) prompting strategy. Note that the Context, Dataset, and Question are the same as in Fig. \ref{['fig:method1']}.
Figure 5: An example prompt generated using the proposed Explain ICL (E-ICL) prompting strategy.
...and 15 more figures

In-Context Explainers: Harnessing LLMs for Explaining Black Box Models

TL;DR

Abstract

In-Context Explainers: Harnessing LLMs for Explaining Black Box Models

Authors

TL;DR

Abstract

Table of Contents

Figures (20)