Unraveling the Dilemma of AI Errors: Exploring the Effectiveness of Human and Machine Explanations for Large Language Models
Marvin Pafla, Kate Larson, Mark Hancock
TL;DR
The paper assesses the efficacy of human versus machine explanations for Large Language Models in a QA setting using SQuAD-based tasks. It collects 156 human explanations and contrasts them with machine explanations from integrated gradients, conservative LRP, and ChatGPT, evaluating them in a large online study (N=136) across correct and incorrect AI outputs. Findings show human saliency explanations are more helpful than machine ones, yet explainability can decrease performance when explanations accompany incorrect AI predictions, revealing an AI-explanation dilemma and confirmation bias risk. The study offers design and research recommendations to improve XAI practice, emphasizing the importance of including incorrect predictions in evaluations and framing explanations as exploratory aids rather than definitive explanations.
Abstract
The field of eXplainable artificial intelligence (XAI) has produced a plethora of methods (e.g., saliency-maps) to gain insight into artificial intelligence (AI) models, and has exploded with the rise of deep learning (DL). However, human-participant studies question the efficacy of these methods, particularly when the AI output is wrong. In this study, we collected and analyzed 156 human-generated text and saliency-based explanations collected in a question-answering task (N=40) and compared them empirically to state-of-the-art XAI explanations (integrated gradients, conservative LRP, and ChatGPT) in a human-participant study (N=136). Our findings show that participants found human saliency maps to be more helpful in explaining AI answers than machine saliency maps, but performance negatively correlated with trust in the AI model and explanations. This finding hints at the dilemma of AI errors in explanation, where helpful explanations can lead to lower task performance when they support wrong AI predictions.
