Table of Contents
Fetching ...

Trick Me If You Can: Human-in-the-loop Generation of Adversarial Examples for Question Answering

Eric Wallace, Pedro Rodriguez, Shi Feng, Ikuya Yamada, Jordan Boyd-Graber

TL;DR

This work introduces a human-in-the-loop framework for generating adversarial QA data by guiding trivia experts with model interpretations through an interactive interface. By applying this to Quizbowl, the authors create a diverse, adversarially-authored dataset that remains human-solvable while dramatically reducing QA performance, uncovering reasoning and distraction-based weaknesses. The study combines offline transfers across IR and neural QA systems with live human-vs-computer matches, showing humans consistently outperforming state-of-the-art systems on adversarial content. Overall, the approach reveals concrete failure modes and offers a practical path toward more robust QA and broader adversarial dataset creation.

Abstract

Adversarial evaluation stress tests a model's understanding of natural language. While past approaches expose superficial patterns, the resulting adversarial examples are limited in complexity and diversity. We propose human-in-the-loop adversarial generation, where human authors are guided to break models. We aid the authors with interpretations of model predictions through an interactive user interface. We apply this generation framework to a question answering task called Quizbowl, where trivia enthusiasts craft adversarial questions. The resulting questions are validated via live human--computer matches: although the questions appear ordinary to humans, they systematically stump neural and information retrieval models. The adversarial questions cover diverse phenomena from multi-hop reasoning to entity type distractors, exposing open challenges in robust question answering.

Trick Me If You Can: Human-in-the-loop Generation of Adversarial Examples for Question Answering

TL;DR

This work introduces a human-in-the-loop framework for generating adversarial QA data by guiding trivia experts with model interpretations through an interactive interface. By applying this to Quizbowl, the authors create a diverse, adversarially-authored dataset that remains human-solvable while dramatically reducing QA performance, uncovering reasoning and distraction-based weaknesses. The study combines offline transfers across IR and neural QA systems with live human-vs-computer matches, showing humans consistently outperforming state-of-the-art systems on adversarial content. Overall, the approach reveals concrete failure modes and offers a practical path toward more robust QA and broader adversarial dataset creation.

Abstract

Adversarial evaluation stress tests a model's understanding of natural language. While past approaches expose superficial patterns, the resulting adversarial examples are limited in complexity and diversity. We propose human-in-the-loop adversarial generation, where human authors are guided to break models. We aid the authors with interpretations of model predictions through an interactive user interface. We apply this generation framework to a question answering task called Quizbowl, where trivia enthusiasts craft adversarial questions. The resulting questions are validated via live human--computer matches: although the questions appear ordinary to humans, they systematically stump neural and information retrieval models. The adversarial questions cover diverse phenomena from multi-hop reasoning to entity type distractors, exposing open challenges in robust question answering.

Paper Structure

This paper contains 34 sections, 1 equation, 9 figures, 6 tables.

Figures (9)

  • Figure 1: Adversarial evaluation in nlp typically focuses on a specific phenomenon (e.g., word replacements) and then generates the corresponding examples (top). Consequently, adversarial examples are limited to the diversity of what the underlying generative model or perturbation rule can produce and also require downstream human evaluation to ensure validity. Our setup (bottom) instead has human-authored examples, using human--computer collaboration to craft adversarial examples with greater diversity.
  • Figure 2: An example Quizbowl question. The question becomes progressively easier (for humans) to answer later on; thus, more knowledgeable players can answer after hearing fewer clues. Our adversarial writing process ensures that the clues also challenge computers.
  • Figure 3: The author writes a question (top right), the qa system provides guesses (left), and explains why it makes those guesses (bottom right). The author can then adapt their question to "trick" the model.
  • Figure 4: The first round of adversarial writing attacks the ir model. Like regular test questions, adversarially-authored questions begin with difficult clues that trick the model. However, the adversarial questions are significantly harder during the crucial middle third of the question.
  • Figure 6: Humans find adversarially-authored question about as difficult as normal questions: rusty weekend warriors (Intermediate), active players (Expert), or the best trivia players in the world (National).
  • ...and 4 more figures