Tell Me a Story! Narrative-Driven XAI with Large Language Models
David Martens, James Hinns, Camille Dams, Mark Vergouwen, Theodoros Evgeniou
TL;DR
XAIstories introduces narrative-based explanations that pair SHAP or counterfactual explanations with large language models to produce human-friendly narratives. The study develops SHAPstories and CFstories via prompt-based prompts in GPT-4 (with mixed experiments using other LLMs) and validates them through four surveys spanning lay users and data scientists, demonstrating improved convincingness, ease, speed, and understanding of AI predictions. In image and tabular domains, CFstories deliver strong convincingness and substantial narrative-generation speedups, while SHAPstories enhance comprehension and decision-making, notably in credit scoring scenarios. The findings support narrative explanations as a practical approach to increase transparency, trust, and decision quality when deploying black-box AI in diverse real-world settings, albeit with acknowledged limitations and avenues for domain-specific validation.
Abstract
In many AI applications today, the predominance of black-box machine learning models, due to their typically higher accuracy, amplifies the need for Explainable AI (XAI). Existing XAI approaches, such as the widely used SHAP values or counterfactual (CF) explanations, are arguably often too technical for users to understand and act upon. To enhance comprehension of explanations of AI decisions and the overall user experience, we introduce XAIstories, which leverage Large Language Models to provide narratives about how AI predictions are made: SHAPstories do so based on SHAP explanations, while CFstories do so for CF explanations. We study the impact of our approach on users' experience and understanding of AI predictions. Our results are striking: over 90% of the surveyed general audience finds the narratives generated by SHAPstories convincing. Data scientists primarily see the value of SHAPstories in communicating explanations to a general audience, with 83% of data scientists indicating they are likely to use SHAPstories for this purpose. In an image classification setting, CFstories are considered more or equally convincing as the users' own crafted stories by more than 75% of the participants. CFstories additionally bring a tenfold speed gain in creating a narrative. We also find that SHAPstories help users to more accurately summarize and understand AI decisions, in a credit scoring setting we test, correctly answering comprehension questions significantly more often than they do when only SHAP values are provided. The results thereby suggest that XAIstories may significantly help explaining and understanding AI predictions, ultimately supporting better decision-making in various applications.
