Comparing Explanation Faithfulness between Multilingual and Monolingual Fine-tuned Language Models
Zhixue Zhao, Nikolaos Aletras
TL;DR
This paper investigates whether explanations produced by feature attribution methods faithfully reflect the inner reasoning of multilingual versus monolingual language models. It conducts a large-scale, cross-language study using five languages, two model families (multilingual and monolingual), and five FA methods across diverse tasks, assessing faithfulness with hard and soft sufficiency/comprehensiveness metrics and AOPC. The findings show that faithfulness disparities depend on model size and are strongly influenced by tokenization: larger multilingual models (e.g., XLM-R) tend to yield less faithful rationales than their monolingual counterparts, and aggressive multilingual tokenizers contribute to these gaps. Soft-faithfulness metrics mitigate many disparities, and targeted experiments indicate tokenization as a primary driver, suggesting practical guidelines for selecting models when explainability is critical and prompting future work on tokenizer-aware evaluation across more languages and architectures.
Abstract
In many real natural language processing application scenarios, practitioners not only aim to maximize predictive performance but also seek faithful explanations for the model predictions. Rationales and importance distribution given by feature attribution methods (FAs) provide insights into how different parts of the input contribute to a prediction. Previous studies have explored how different factors affect faithfulness, mainly in the context of monolingual English models. On the other hand, the differences in FA faithfulness between multilingual and monolingual models have yet to be explored. Our extensive experiments, covering five languages and five popular FAs, show that FA faithfulness varies between multilingual and monolingual models. We find that the larger the multilingual model, the less faithful the FAs are compared to its counterpart monolingual models.Our further analysis shows that the faithfulness disparity is potentially driven by the differences between model tokenizers. Our code is available: https://github.com/casszhao/multilingual-faith.
