Don't Just Translate, Agitate: Using Large Language Models as Devil's Advocates for AI Explanations
Ashley Suh, Kenneth Alperin, Harry Li, Steven R Gomez
TL;DR
The paper addresses the risk that translating XAI outputs into natural language using LLMs can mislead users into overconfidence and superficial engagement. It argues for reimagining LLMs as devil's advocates that actively interrogate explanations by surfacing uncertainty, biases, and counterfactuals, rather than simply translating them. The authors review current findings showing narrative explanations can be persuasive but not necessarily more informative or trustworthy, and they propose adversarial prompting strategies and persona-based, user-aware workflows to foster critical engagement. This shift aims to improve decision-support by reducing overreliance and promoting deeper understanding of AI explanations, with future work focusing on dynamic depth, counterfactual reasoning, and trust calibration across domains. The work highlights practical implications for XAI design, suggesting adversarial LLM roles may better align explanations with real-world decision needs.
Abstract
This position paper highlights a growing trend in Explainable AI (XAI) research where Large Language Models (LLMs) are used to translate outputs from explainability techniques, like feature-attribution weights, into a natural language explanation. While this approach may improve accessibility or readability for users, recent findings suggest that translating into human-like explanations does not necessarily enhance user understanding and may instead lead to overreliance on AI systems. When LLMs summarize XAI outputs without surfacing model limitations, uncertainties, or inconsistencies, they risk reinforcing the illusion of interpretability rather than fostering meaningful transparency. We argue that - instead of merely translating XAI outputs - LLMs should serve as constructive agitators, or devil's advocates, whose role is to actively interrogate AI explanations by presenting alternative interpretations, potential biases, training data limitations, and cases where the model's reasoning may break down. In this role, LLMs can facilitate users in engaging critically with AI systems and generated explanations, with the potential to reduce overreliance caused by misinterpreted or specious explanations.
