Table of Contents
Fetching ...

Fine-grained Sentiment Analysis with Faithful Attention

Ruiqi Zhong, Steven Shao, Kathleen McKeown

TL;DR

This work tackles targeted sentiment analysis by predicting sentiment relations between a specified source and target, leveraging supervised attention guided by human rationales. The method augments a state-of-the-art relation extractor with a KL-divergence attention loss, $\mathcal{L} = \mathcal{L}_{clf} + \lambda_{attn} \mathcal{L}_{attn}$, where $\mathcal{L}_{attn} = KL(A || \hat{A})$, and optionally a rationale-prediction term $\mathcal{L}_{r}$; a variant with limited rationales is explored. Across MPQA2.0 and GFBF, the trained-attention approach yields 4–8 point improvements over untrained baselines and outperforms a rationale-based multi-task baseline, with only a small fraction of rationales needed to achieve substantial gains. The paper also introduces probes-needed and mass-needed as faithfulness metrics and uses crowd-sourced plausibility tests, revealing that trained attention can be more plausible than untrained attention, though faithfulness is dataset-dependent. The findings suggest that integrating concise human rationales can repair attention in low-data regimes (notably GFBF) and provide meaningful, human-aligned explanations in sentiment-relational tasks, with practical implications for interpretable relation extraction.

Abstract

While the general task of textual sentiment classification has been widely studied, much less research looks specifically at sentiment between a specified source and target. To tackle this problem, we experimented with a state-of-the-art relation extraction model. Surprisingly, we found that despite reasonable performance, the model's attention was often systematically misaligned with the words that contribute to sentiment. Thus, we directly trained the model's attention with human rationales and improved our model performance by a robust 4~8 points on all tasks we defined on our data sets. We also present a rigorous analysis of the model's attention, both trained and untrained, using novel and intuitive metrics. Our results show that untrained attention does not provide faithful explanations; however, trained attention with concisely annotated human rationales not only increases performance, but also brings faithful explanations. Encouragingly, a small amount of annotated human rationales suffice to correct the attention in our task.

Fine-grained Sentiment Analysis with Faithful Attention

TL;DR

This work tackles targeted sentiment analysis by predicting sentiment relations between a specified source and target, leveraging supervised attention guided by human rationales. The method augments a state-of-the-art relation extractor with a KL-divergence attention loss, , where , and optionally a rationale-prediction term ; a variant with limited rationales is explored. Across MPQA2.0 and GFBF, the trained-attention approach yields 4–8 point improvements over untrained baselines and outperforms a rationale-based multi-task baseline, with only a small fraction of rationales needed to achieve substantial gains. The paper also introduces probes-needed and mass-needed as faithfulness metrics and uses crowd-sourced plausibility tests, revealing that trained attention can be more plausible than untrained attention, though faithfulness is dataset-dependent. The findings suggest that integrating concise human rationales can repair attention in low-data regimes (notably GFBF) and provide meaningful, human-aligned explanations in sentiment-relational tasks, with practical implications for interpretable relation extraction.

Abstract

While the general task of textual sentiment classification has been widely studied, much less research looks specifically at sentiment between a specified source and target. To tackle this problem, we experimented with a state-of-the-art relation extraction model. Surprisingly, we found that despite reasonable performance, the model's attention was often systematically misaligned with the words that contribute to sentiment. Thus, we directly trained the model's attention with human rationales and improved our model performance by a robust 4~8 points on all tasks we defined on our data sets. We also present a rigorous analysis of the model's attention, both trained and untrained, using novel and intuitive metrics. Our results show that untrained attention does not provide faithful explanations; however, trained attention with concisely annotated human rationales not only increases performance, but also brings faithful explanations. Encouragingly, a small amount of annotated human rationales suffice to correct the attention in our task.

Paper Structure

This paper contains 33 sections, 14 equations, 3 figures, 5 tables.

Figures (3)

  • Figure 1: On the left/right hand side is F-score Performance/attention loss improvement (y-axis) vs. using different percentages of human-annotated rationales (x-axis). We calculate the difference from attention loss when training with all rationales. To draw the plot, for MPQA, we sample 100, 200, 400, and 800 rationales (corresponding to 4%, 8%, 16%, and 33% of all rationales); and for GFBF, 50, 100, 200, and 400 rationales (7%, 13%, 27%, and 53%).
  • Figure 2: A typical data point where the untrained model attends to the target, and the trained attends to the key word for the relation (in this case, between "Socialized medicine" and "Medicare").
  • Figure 3: An example question shown to workers.