Towards Faithful Explanations: Boosting Rationalization with Shortcuts Discovery
Linan Yue, Qi Liu, Yichao Du, Li Wang, Weibo Gao, Yanqing An
TL;DR
This paper tackles the problem of faithful explanations in neural text classification by addressing shortcuts that spuriously align inputs with predictions. It introduces Shortcuts-fused Selective Rationalization (SSR), which first discovers potential shortcut tokens and then uses two strategies (shared parameters and prediction-time de-correlation, plus a virtual-shortcuts variant) to mitigate shortcut-driven rationales, complemented by data augmentation to bridge labeled/unlabeled data gaps. Across ERASER benchmarks, SSR variants outperform unsupervised and semi-supervised baselines and come close to or exceed some supervised methods, with semantic data augmentation providing notable gains and improved out-of-domain generalization. The work offers a practical, model-agnostic approach to more faithful explanations, with potential applicability to privacy-sensitive or locally deployed decision systems and avenues for extending to LLM explanations.
Abstract
The remarkable success in neural networks provokes the selective rationalization. It explains the prediction results by identifying a small subset of the inputs sufficient to support them. Since existing methods still suffer from adopting the shortcuts in data to compose rationales and limited large-scale annotated rationales by human, in this paper, we propose a Shortcuts-fused Selective Rationalization (SSR) method, which boosts the rationalization by discovering and exploiting potential shortcuts. Specifically, SSR first designs a shortcuts discovery approach to detect several potential shortcuts. Then, by introducing the identified shortcuts, we propose two strategies to mitigate the problem of utilizing shortcuts to compose rationales. Finally, we develop two data augmentations methods to close the gap in the number of annotated rationales. Extensive experimental results on real-world datasets clearly validate the effectiveness of our proposed method.
