Table of Contents
Fetching ...

Polyjuice: Generating Counterfactuals for Explaining, Evaluating, and Improving Models

Tongshuang Wu, Marco Tulio Ribeiro, Jeffrey Heer, Daniel S. Weld

TL;DR

Polyjuice is presented, a general-purpose counterfactual generator that allows for control over perturbation types and locations, trained by finetuning GPT-2 on multiple datasets of paired sentences.

Abstract

While counterfactual examples are useful for analysis and training of NLP models, current generation methods either rely on manual labor to create very few counterfactuals, or only instantiate limited types of perturbations such as paraphrases or word substitutions. We present Polyjuice, a general-purpose counterfactual generator that allows for control over perturbation types and locations, trained by finetuning GPT-2 on multiple datasets of paired sentences. We show that Polyjuice produces diverse sets of realistic counterfactuals, which in turn are useful in various distinct applications: improving training and evaluation on three different tasks (with around 70% less annotation effort than manual generation), augmenting state-of-the-art explanation techniques, and supporting systematic counterfactual error analysis by revealing behaviors easily missed by human experts.

Polyjuice: Generating Counterfactuals for Explaining, Evaluating, and Improving Models

TL;DR

Polyjuice is presented, a general-purpose counterfactual generator that allows for control over perturbation types and locations, trained by finetuning GPT-2 on multiple datasets of paired sentences.

Abstract

While counterfactual examples are useful for analysis and training of NLP models, current generation methods either rely on manual labor to create very few counterfactuals, or only instantiate limited types of perturbations such as paraphrases or word substitutions. We present Polyjuice, a general-purpose counterfactual generator that allows for control over perturbation types and locations, trained by finetuning GPT-2 on multiple datasets of paired sentences. We show that Polyjuice produces diverse sets of realistic counterfactuals, which in turn are useful in various distinct applications: improving training and evaluation on three different tasks (with around 70% less annotation effort than manual generation), augmenting state-of-the-art explanation techniques, and supporting systematic counterfactual error analysis by revealing behaviors easily missed by human experts.

Paper Structure

This paper contains 32 sections, 3 equations, 10 figures, 7 tables.

Figures (10)

  • Figure 1: Overview: (A) given a sentiment analysis instance $x$, Polyjuicegenerates (B) various counterfactuals $\hat{x}$, which are then (C) selected for downstream use. e.g., in (D) we select counterfactual explanations that complement a black box explanation: though "great" and "kids" are deemed important, perturbing them may not affect the prediction $f(x)=f(\hat{x})=\text{positive}$, revealing model failures not covered by feature attributions.
  • Figure 2: (A) Polyjuice prompt format, which concatenates the original $x$, the control code, and the $\hat{x}$ ("It is not great for children" converted to an infilling structure). At generation time, Polyjuice accepts prompts that just include $x$ (Line 1), or optionally with the code and the [BLANK]s (Lines 2--3), and fills in the blanks sequentially with spans separated by [ANSWER]s (Line 4). (B) Polyjuice allows blanking at different granularities (even the entire sentence), such that Lines 3--4 in (A) can be replaced by Lines 6--7 or 8--9.
  • Figure 3: (A) An instance in QQP where the model prediction $f(x)$ is Duplicate ($=$) at 98.2% confidence, with SHAP importance weights for tokens in Q2. Counterfactual explanations complement SHAP with concrete examples and surprising behaviors, e.g., (B) shows that friend$\mathrel{ \mkern-4mu\hbox{)}}$woman surprisingly flips the prediction to Non-Duplicate ($\neq)$, despite the low weight on "friend."
  • Figure 4: Simulation error rates per condition (higher the better). Polyjuice-surprise has the highest error rate, indicating these counterfactuals would add the most information to users if displayed.
  • Figure 5: (A) An NLI case with a Neutral prediction (underlined$f(\hat{x})$ are correct). Polyjuice generates counterfactual hypotheses conditioned on the negation control code. (B) Generalizing perturbations into patterns wu2020tempura. The change DET$\mathrel{ \mkern-4mu\hbox{)}}$no flips $92.8\%$ of predictions from Neutral $\mathrel{ \mkern-4mu\hbox{)}}$Contradiction.
  • ...and 5 more figures