Table of Contents
Fetching ...

Why is "Problems" Predictive of Positive Sentiment? A Case Study of Explaining Unintuitive Features in Sentiment Classification

Jiaming Qu, Jaime Arguello, Yue Wang

TL;DR

This paper tackles the challenge that some predictive input features in sentiment classification appear unintuitive to humans. It combines an LLM-based zero-shot estimator with three explanation tools—data distribution, training examples, and contextual patterns—to detect and explain unintuitive features. Through a two-phase crowdsourced study (N=300) across product categories, the authors show that while single tools can aid objective judgments, the best understanding and user experience emerge from combining tools that link predictive features to training data and contextual usage. The work advances practical guidance for designing XAI explanations that not only identify predictive features but also illuminate why they are predictive, potentially increasing trust and learning in real-world tasks.

Abstract

Explainable AI (XAI) algorithms aim to help users understand how a machine learning model makes predictions. To this end, many approaches explain which input features are most predictive of a target label. However, such explanations can still be puzzling to users (e.g., in product reviews, the word "problems" is predictive of positive sentiment). If left unexplained, puzzling explanations can have negative impacts. Explaining unintuitive associations between an input feature and a target label is an underexplored area in XAI research. We take an initial effort in this direction using unintuitive associations learned by sentiment classifiers as a case study. We propose approaches for (1) automatically detecting associations that can appear unintuitive to users and (2) generating explanations to help users understand why an unintuitive feature is predictive. Results from a crowdsourced study (N=300) found that our proposed approaches can effectively detect and explain predictive but unintuitive features in sentiment classification.

Why is "Problems" Predictive of Positive Sentiment? A Case Study of Explaining Unintuitive Features in Sentiment Classification

TL;DR

This paper tackles the challenge that some predictive input features in sentiment classification appear unintuitive to humans. It combines an LLM-based zero-shot estimator with three explanation tools—data distribution, training examples, and contextual patterns—to detect and explain unintuitive features. Through a two-phase crowdsourced study (N=300) across product categories, the authors show that while single tools can aid objective judgments, the best understanding and user experience emerge from combining tools that link predictive features to training data and contextual usage. The work advances practical guidance for designing XAI explanations that not only identify predictive features but also illuminate why they are predictive, potentially increasing trust and learning in real-world tasks.

Abstract

Explainable AI (XAI) algorithms aim to help users understand how a machine learning model makes predictions. To this end, many approaches explain which input features are most predictive of a target label. However, such explanations can still be puzzling to users (e.g., in product reviews, the word "problems" is predictive of positive sentiment). If left unexplained, puzzling explanations can have negative impacts. Explaining unintuitive associations between an input feature and a target label is an underexplored area in XAI research. We take an initial effort in this direction using unintuitive associations learned by sentiment classifiers as a case study. We propose approaches for (1) automatically detecting associations that can appear unintuitive to users and (2) generating explanations to help users understand why an unintuitive feature is predictive. Results from a crowdsourced study (N=300) found that our proposed approaches can effectively detect and explain predictive but unintuitive features in sentiment classification.
Paper Structure (21 sections, 6 figures, 1 table)

This paper contains 21 sections, 6 figures, 1 table.

Figures (6)

  • Figure 1: Phase 2 interface design. Questions and tools (if any) were displayed side-by-side (A). Visual representations of our tools are shown in subfigures B-D.
  • Figure 2: Effects of different interface conditions on participants' understanding with means and 95% confidence intervals. The star mark highlights interface conditions with statistically significant effects ($p < .05$) compared to the Baseline condition.
  • Figure 3: Effects of different interface conditions on participants' perceptions with means and 95% confidence intervals. The star mark highlights interface conditions with statistical significance ($p < .05$) compared to the Baseline condition.
  • Figure 4: Effects of different interface conditions on participants' behaviors with means and 95% confidence intervals. The star mark highlights interface conditions with statistical significance ($p < .05$) compared to the Baseline condition.
  • Figure 5: Word sampling and task allocation. We used the zero-shot classifier to estimate $P_z(y=pos|w)$, the probability that word $w$ conveys positive sentiment, for every word in sets $\mathcal{S}^+$ and $\mathcal{S}^-$ (i.e., predictive of positive and negative according to the logistic regression model). For Phase 1, we applied stratified sampling to select 120 words and organized them into 12 batches of 10 words each. Each batch was redundantly judged by five participants. For Phase 2, we applied selective sampling to select 40 words and organized them into 10 batches of 4 words each. Each batch was redundantly judged by six participants, each in a different interface condition. Three criteria were used for the selective sampling. First, we included words with $P_z(y=pos|w) < 0.2$ from set $\mathcal{S}^+$. These are words that are paradoxically predictive of positive. Second, we included words with $P_z(y=pos|w) > 0.8$ from set $\mathcal{S}^-$. These are words that are paradoxically predictive of negative. Third, we included words with $0.2 \le P_z(y=pos|w) \le 0.8$ from sets $\mathcal{S}^+$ and $\mathcal{S}^-$. These are words that are unintuitive regardless of which sentiment they are predictive of.
  • ...and 1 more figures