Can LLMs Faithfully Explain Themselves in Low-Resource Languages? A Case Study on Emotion Detection in Persian
Mobina Mehrazar, Mohammad Amin Yousefi, Parisa Abolfath Beygi, Behnam Bahrak
TL;DR
This study evaluates whether LLM-produced self-explanations faithfully reflect decision-making in Persian emotion detection, using token-level log probabilities and calibration to obtain reliable confidence. It compares two prompting orders, Predict-then-Explain and Explain-then-Predict, across GPT-family and other models on the ARMANEMO Persian corpus, revealing strong classification performance but limited faithfulness of explanations relative to human judgments. Calibration via temperature scaling improves confidence alignment but explanations still diverge from human reasoning, with model agreement surpassing human-model agreement. The results highlight the need for more robust explanation strategies and evaluation metrics for multilingual, low-resource contexts, and suggest post-hoc rationalization (P-E) generally yields better faithfulness than joint generation (E-P).
Abstract
Large language models (LLMs) are increasingly used to generate self-explanations alongside their predictions, a practice that raises concerns about the faithfulness of these explanations, especially in low-resource languages. This study evaluates the faithfulness of LLM-generated explanations in the context of emotion classification in Persian, a low-resource language, by comparing the influential words identified by the model against those identified by human annotators. We assess faithfulness using confidence scores derived from token-level log-probabilities. Two prompting strategies, differing in the order of explanation and prediction (Predict-then-Explain and Explain-then-Predict), are tested for their impact on explanation faithfulness. Our results reveal that while LLMs achieve strong classification performance, their generated explanations often diverge from faithful reasoning, showing greater agreement with each other than with human judgments. These results highlight the limitations of current explanation methods and metrics, emphasizing the need for more robust approaches to ensure LLM reliability in multilingual and low-resource contexts.
