Making FETCH! Happen: Finding Emergent Dog Whistles Through Common Habitats
Kuleen Sasse, Carlos Aguirre, Isabel Cachola, Sharon Levy, Mark Dredze
TL;DR
This work introduces FETCH! as a benchmark for discovering emergent dog whistles in large social-media corpora and presents EarShot, a three-stage system that leverages sentence embeddings, vector databases, and selective LLM/BERT-based filtering to identify novel dog whistles. Across synthetic, balanced, and realistic data scenarios, state-of-the-art methods underperform dramatically (F_{0.5} < 0.05) while EarShot achieves notable gains, especially when using the PREDICT pipeline to maximize precision. The study provides a rigorous evaluation framework, analyzes the strengths and limitations of embedding-based, MLM-based, and hybrid approaches, and discusses practical and ethical implications for deployment in moderation and research. It also outlines future directions, such as hybridizing models and incorporating richer linguistic signals, to improve robustness to recency and context in dog whistle discovery.
Abstract
WARNING: This paper contains content that maybe upsetting or offensive to some readers. Dog whistles are coded expressions with dual meanings: one intended for the general public (outgroup) and another that conveys a specific message to an intended audience (ingroup). Often, these expressions are used to convey controversial political opinions while maintaining plausible deniability and slip by content moderation filters. Identification of dog whistles relies on curated lexicons, which have trouble keeping up to date. We introduce FETCH!, a task for finding novel dog whistles in massive social media corpora. We find that state-of-the-art systems fail to achieve meaningful results across three distinct social media case studies. We present EarShot, a strong baseline system that combines the strengths of vector databases and Large Language Models (LLMs) to efficiently and effectively identify new dog whistles.
