RORA: Robust Free-Text Rationale Evaluation
Zhengping Jiang, Yining Lu, Hanjie Chen, Daniel Khashabi, Benjamin Van Durme, Anqi Liu
TL;DR
RoRa tackles the challenge of evaluating free-text rationales in NLP when label leakage skews judgments. It introduces a three-stage pipeline—leakage detection, counterfactual data augmentation, and invariant evaluation via IRM—to quantify the non-leaky information a rationale provides about the label. By grounding the evaluation in conditional $\mathcal{V}$-information and enforcing invariance across leakage-perturbed environments, RoRa achieves robust alignment with human judgments and outperforms existing baselines on StrategyQA and CommonsenseQA. The approach is model-agnostic and adaptable to various rationale types, offering a practical, principled metric for rationales that advance genuine understanding rather than tautological leakage.
Abstract
Free-text rationales play a pivotal role in explainable NLP, bridging the knowledge and reasoning gaps behind a model's decision-making. However, due to the diversity of potential reasoning paths and a corresponding lack of definitive ground truth, their evaluation remains a challenge. Existing evaluation metrics rely on the degree to which a rationale supports a target label, but we find these fall short in evaluating rationales that inadvertently leak the labels. To address this problem, we propose RORA, a Robust free-text Rationale evaluation against label leakage. RORA quantifies the new information supplied by a rationale to justify the label. This is achieved by assessing the conditional V-information \citep{hewitt-etal-2021-conditional} with a predictive family robust against leaky features that can be exploited by a small model. RORA consistently outperforms existing approaches in evaluating human-written, synthetic, or model-generated rationales, particularly demonstrating robustness against label leakage. We also show that RORA aligns well with human judgment, providing a more reliable and accurate measurement across diverse free-text rationales.
