Table of Contents
Fetching ...

LLMs for Explainable AI: A Comprehensive Survey

Ahsan Bilal, David Ebert, Beiyu Lin

TL;DR

Large Language Models (LLMs) can translate complex model behavior into human-friendly narratives, enabling explainability in critical domains. This survey synthesizes three explanatory paradigms—post-hoc explanations, intrinsic interpretability, and human-centered explanations—along with evaluation metrics, benchmark datasets, and real-world applications. It reviews major methods (IG/LIME/SHAP, Chain-of-Thought, ReAct), discusses evaluation challenges (faithfulness vs plausibility), and catalogs domain datasets (e-SNLI, WorldTree, HateXplain), highlighting current limitations such as privacy, bias, and data integration. The paper also outlines future directions, including automation with user feedback, multimodal explanations, and cross-disciplinary collaboration to improve trustworthiness and adoption of XAI via LLMs.

Abstract

Large Language Models (LLMs) offer a promising approach to enhancing Explainable AI (XAI) by transforming complex machine learning outputs into easy-to-understand narratives, making model predictions more accessible to users, and helping bridge the gap between sophisticated model behavior and human interpretability. AI models, such as state-of-the-art neural networks and deep learning models, are often seen as "black boxes" due to a lack of transparency. As users cannot fully understand how the models reach conclusions, users have difficulty trusting decisions from AI models, which leads to less effective decision-making processes, reduced accountabilities, and unclear potential biases. A challenge arises in developing explainable AI (XAI) models to gain users' trust and provide insights into how models generate their outputs. With the development of Large Language Models, we want to explore the possibilities of using human language-based models, LLMs, for model explainabilities. This survey provides a comprehensive overview of existing approaches regarding LLMs for XAI, and evaluation techniques for LLM-generated explanation, discusses the corresponding challenges and limitations, and examines real-world applications. Finally, we discuss future directions by emphasizing the need for more interpretable, automated, user-centric, and multidisciplinary approaches for XAI via LLMs.

LLMs for Explainable AI: A Comprehensive Survey

TL;DR

Large Language Models (LLMs) can translate complex model behavior into human-friendly narratives, enabling explainability in critical domains. This survey synthesizes three explanatory paradigms—post-hoc explanations, intrinsic interpretability, and human-centered explanations—along with evaluation metrics, benchmark datasets, and real-world applications. It reviews major methods (IG/LIME/SHAP, Chain-of-Thought, ReAct), discusses evaluation challenges (faithfulness vs plausibility), and catalogs domain datasets (e-SNLI, WorldTree, HateXplain), highlighting current limitations such as privacy, bias, and data integration. The paper also outlines future directions, including automation with user feedback, multimodal explanations, and cross-disciplinary collaboration to improve trustworthiness and adoption of XAI via LLMs.

Abstract

Large Language Models (LLMs) offer a promising approach to enhancing Explainable AI (XAI) by transforming complex machine learning outputs into easy-to-understand narratives, making model predictions more accessible to users, and helping bridge the gap between sophisticated model behavior and human interpretability. AI models, such as state-of-the-art neural networks and deep learning models, are often seen as "black boxes" due to a lack of transparency. As users cannot fully understand how the models reach conclusions, users have difficulty trusting decisions from AI models, which leads to less effective decision-making processes, reduced accountabilities, and unclear potential biases. A challenge arises in developing explainable AI (XAI) models to gain users' trust and provide insights into how models generate their outputs. With the development of Large Language Models, we want to explore the possibilities of using human language-based models, LLMs, for model explainabilities. This survey provides a comprehensive overview of existing approaches regarding LLMs for XAI, and evaluation techniques for LLM-generated explanation, discusses the corresponding challenges and limitations, and examines real-world applications. Finally, we discuss future directions by emphasizing the need for more interpretable, automated, user-centric, and multidisciplinary approaches for XAI via LLMs.

Paper Structure

This paper contains 22 sections, 5 figures, 3 tables.

Figures (5)

  • Figure 1: Explainable AI and Its Intersection with LLMs
  • Figure 2: Techniques for explainability in Large Language Models (LLMs). The diagram identifies three broad categories of approaches to explaining: Post-hoc Explanation, Intrinsic Interpretability, and Human-Centered Explanations, providing examples of methods in each category.
  • Figure 3: Summary of the challenges of XAI
  • Figure 4: Saliency Comparison of Models by koo2024benchmarking
  • Figure 5: Future directions for improving explainability in AI using LLMs