Table of Contents
Fetching ...

RORA: Robust Free-Text Rationale Evaluation

Zhengping Jiang, Yining Lu, Hanjie Chen, Daniel Khashabi, Benjamin Van Durme, Anqi Liu

TL;DR

RoRa tackles the challenge of evaluating free-text rationales in NLP when label leakage skews judgments. It introduces a three-stage pipeline—leakage detection, counterfactual data augmentation, and invariant evaluation via IRM—to quantify the non-leaky information a rationale provides about the label. By grounding the evaluation in conditional $\mathcal{V}$-information and enforcing invariance across leakage-perturbed environments, RoRa achieves robust alignment with human judgments and outperforms existing baselines on StrategyQA and CommonsenseQA. The approach is model-agnostic and adaptable to various rationale types, offering a practical, principled metric for rationales that advance genuine understanding rather than tautological leakage.

Abstract

Free-text rationales play a pivotal role in explainable NLP, bridging the knowledge and reasoning gaps behind a model's decision-making. However, due to the diversity of potential reasoning paths and a corresponding lack of definitive ground truth, their evaluation remains a challenge. Existing evaluation metrics rely on the degree to which a rationale supports a target label, but we find these fall short in evaluating rationales that inadvertently leak the labels. To address this problem, we propose RORA, a Robust free-text Rationale evaluation against label leakage. RORA quantifies the new information supplied by a rationale to justify the label. This is achieved by assessing the conditional V-information \citep{hewitt-etal-2021-conditional} with a predictive family robust against leaky features that can be exploited by a small model. RORA consistently outperforms existing approaches in evaluating human-written, synthetic, or model-generated rationales, particularly demonstrating robustness against label leakage. We also show that RORA aligns well with human judgment, providing a more reliable and accurate measurement across diverse free-text rationales.

RORA: Robust Free-Text Rationale Evaluation

TL;DR

RoRa tackles the challenge of evaluating free-text rationales in NLP when label leakage skews judgments. It introduces a three-stage pipeline—leakage detection, counterfactual data augmentation, and invariant evaluation via IRM—to quantify the non-leaky information a rationale provides about the label. By grounding the evaluation in conditional -information and enforcing invariance across leakage-perturbed environments, RoRa achieves robust alignment with human judgments and outperforms existing baselines on StrategyQA and CommonsenseQA. The approach is model-agnostic and adaptable to various rationale types, offering a practical, principled metric for rationales that advance genuine understanding rather than tautological leakage.

Abstract

Free-text rationales play a pivotal role in explainable NLP, bridging the knowledge and reasoning gaps behind a model's decision-making. However, due to the diversity of potential reasoning paths and a corresponding lack of definitive ground truth, their evaluation remains a challenge. Existing evaluation metrics rely on the degree to which a rationale supports a target label, but we find these fall short in evaluating rationales that inadvertently leak the labels. To address this problem, we propose RORA, a Robust free-text Rationale evaluation against label leakage. RORA quantifies the new information supplied by a rationale to justify the label. This is achieved by assessing the conditional V-information \citep{hewitt-etal-2021-conditional} with a predictive family robust against leaky features that can be exploited by a small model. RORA consistently outperforms existing approaches in evaluating human-written, synthetic, or model-generated rationales, particularly demonstrating robustness against label leakage. We also show that RORA aligns well with human judgment, providing a more reliable and accurate measurement across diverse free-text rationales.
Paper Structure (29 sections, 19 equations, 6 figures, 6 tables)

This paper contains 29 sections, 19 equations, 6 figures, 6 tables.

Figures (6)

  • Figure 1: RoRa framework for evaluating rationales $R_1^{\text{True}}$, $R_2^{\text{True}}$, $R_3^{\text{True}}$. Existing baselines are highly sensitive to rationales that simply restate the label or paraphrase the given question and label, leading to inflated scores compared to the human-annotated rationale. In contrast, RoRa provides an informativeness score that better characterizes rationale quality. It is achieved by 1 detecting potential leakage tokens in the rationale (§\ref{['subsec:leakage-detection']}) and 2 generate additional training data with counterfactual editing for data augmentation (§\ref{['subsec: data augmentation']}), followed by 3 training an evaluation model invariant to label leakage (§\ref{['subsec: evaluation']}).
  • Figure 2: Linear regression on pointwise score correlation between human evaluation results and RoRa scores. Shades correspond to a 95% confidence interval.
  • Figure 3: Sensitivity test results of RoRa on leakage detection threshold and IRM regularization parameter. Decreasing threshold and increasing IRM regularization parameter help RoRa to better counteract label leakage. RoRa appears to be stable when the parameter was chosen from a relatively wide range.
  • Figure 4: A causal graph showcasing the generative story of the label $Y$ of an instance, where correctness indicator variables from LAS hase-etal-2020-leakage are replaced by an actual variable.
  • Figure 5: The interface to our human validation hit.
  • ...and 1 more figures

Theorems & Definitions (2)

  • Definition 1: Multivariable $\mathcal{V}$-information
  • Remark