Table of Contents
Fetching ...

From Dogwhistles to Bullhorns: Unveiling Coded Rhetoric with Language Models

Julia Mendelsohn, Ronan Le Bras, Yejin Choi, Maarten Sap

TL;DR

This work frames dogwhistles as dual-meaning coded rhetoric that challenges existing NLP moderation by enabling covert signals to in-groups while masking hostility to out-groups. It advances the field with a taxonomy, a large living glossary (340 terms, 1,000+ forms), and a case study on historical U.S. Congressional speeches to ground analysis in real data. It then assesses large language models (GPT-3 and GPT-4) on recognizing and surfacing dogwhistles, revealing substantial variability by register, persona, and prompt design, and demonstrates that dogwhistles can evade toxicity detectors like Perspective API. Collectively, the paper provides valuable resources for researchers and highlights important implications for online safety and social science research, while calling for context-aware moderation and further integration with mathematical and computational models to detect and mitigate coded rhetoric at scale.

Abstract

Dogwhistles are coded expressions that simultaneously convey one meaning to a broad audience and a second one, often hateful or provocative, to a narrow in-group; they are deployed to evade both political repercussions and algorithmic content moderation. For example, in the sentence 'we need to end the cosmopolitan experiment,' the word 'cosmopolitan' likely means 'worldly' to many, but secretly means 'Jewish' to a select few. We present the first large-scale computational investigation of dogwhistles. We develop a typology of dogwhistles, curate the largest-to-date glossary of over 300 dogwhistles with rich contextual information and examples, and analyze their usage in historical U.S. politicians' speeches. We then assess whether a large language model (GPT-3) can identify dogwhistles and their meanings, and find that GPT-3's performance varies widely across types of dogwhistles and targeted groups. Finally, we show that harmful content containing dogwhistles avoids toxicity detection, highlighting online risks of such coded language. This work sheds light on the theoretical and applied importance of dogwhistles in both NLP and computational social science, and provides resources for future research in modeling dogwhistles and mitigating their online harms.

From Dogwhistles to Bullhorns: Unveiling Coded Rhetoric with Language Models

TL;DR

This work frames dogwhistles as dual-meaning coded rhetoric that challenges existing NLP moderation by enabling covert signals to in-groups while masking hostility to out-groups. It advances the field with a taxonomy, a large living glossary (340 terms, 1,000+ forms), and a case study on historical U.S. Congressional speeches to ground analysis in real data. It then assesses large language models (GPT-3 and GPT-4) on recognizing and surfacing dogwhistles, revealing substantial variability by register, persona, and prompt design, and demonstrates that dogwhistles can evade toxicity detectors like Perspective API. Collectively, the paper provides valuable resources for researchers and highlights important implications for online safety and social science research, while calling for context-aware moderation and further integration with mathematical and computational models to detect and mitigate coded rhetoric at scale.

Abstract

Dogwhistles are coded expressions that simultaneously convey one meaning to a broad audience and a second one, often hateful or provocative, to a narrow in-group; they are deployed to evade both political repercussions and algorithmic content moderation. For example, in the sentence 'we need to end the cosmopolitan experiment,' the word 'cosmopolitan' likely means 'worldly' to many, but secretly means 'Jewish' to a select few. We present the first large-scale computational investigation of dogwhistles. We develop a typology of dogwhistles, curate the largest-to-date glossary of over 300 dogwhistles with rich contextual information and examples, and analyze their usage in historical U.S. politicians' speeches. We then assess whether a large language model (GPT-3) can identify dogwhistles and their meanings, and find that GPT-3's performance varies widely across types of dogwhistles and targeted groups. Finally, we show that harmful content containing dogwhistles avoids toxicity detection, highlighting online risks of such coded language. This work sheds light on the theoretical and applied importance of dogwhistles in both NLP and computational social science, and provides resources for future research in modeling dogwhistles and mitigating their online harms.
Paper Structure (36 sections, 10 figures, 10 tables)

This paper contains 36 sections, 10 figures, 10 tables.

Figures (10)

  • Figure 1: Schematic of how dogwhistles work, based on henderson2018dogwhistles with the example of cosmopolitan. First, a speaker simultaneously communicates the dogwhistle message and their persona (identity). The in-group recovers both the message content and speaker persona, enabling them to arrive at the coded meaning (e.g. Jewish). The out-group only recognizes the message's content and thus interprets it literally. This literal meaning also provides the speaker with plausible deniability; if confronted, the speaker can claim that they solely intended the literal meaning.
  • Figure 2: Visual hierarchical representation of our dogwhistle taxonomy along with examples of each type.
  • Figure 3: Frequency of speeches containing racial dogwhistles in the U.S. Congressional Record (as a fraction of total speeches) over time. The dotted red vertical lines represent noteworthy years. Use of racial dogwhistles began to increase during the Civil Rights Movement and their frequency continued to rise until the 1990s. Since the 1990s, the frequency of speeches containing dogwhistles has fluctuated but remained at overall high levels compared to earlier years.
  • Figure 4: Average ideology score (DW-NOMINATE first dimension) for speakers who used selected dogwhistles over time: welfare reform (top left), thug (top right), property rights (bottom left), and Willie Horton (bottom right). Higher values indicate that the dogwhistle's speakers were more conservative, while lower values indicate that the dogwhistle's speakers were more liberal. For visualization, trends are Lowess-smoothed.
  • Figure 5: Recall of GPT-3 dogwhistle surfacing separated by persona and register. Across all personae, GPT-3 surfaces under 20% of dogwhistles in the informal/online register. Performance is much higher for the formal/offline register but varies across personae, ranging from 44.8% (transphobic) to 100% (Islamophobic).
  • ...and 5 more figures