"Not in My Backyard": LLMs Uncover Online and Offline Social Biases Against Homelessnes
Jonathan A. Karr, Benjamin F. Herbst, Matthew L. Sisk, Xueyun Li, Ting Hua, Matthew Hauenstein, Georgina Curto, Nitesh V. Chawla
TL;DR
This work addresses biases against people experiencing homelessness (PEH) by building a novel multi-domain dataset spanning Reddit, X, news, and city council minutes across 10 U.S. cities and introducing a 16-category bias taxonomy with a dedicated Negative Bias Frame. It shows that small gold-standard annotations are insufficient for training, and demonstrates the value of GPT-generated pseudo-labels to enable knowledge distillation to smaller, privacy-preserving models; data quantity can rival or exceed the impact of larger models. The study reveals that negative PEH biases are most prevalent online—especially on Reddit—and that NIMBY frames strongly engage audiences, with offline discourse being comparatively more solution-oriented. These findings provide actionable guidance for platform moderation and policy communication to support homelessness reduction efforts.
Abstract
Homelessness is a persistent social challenge, impacting millions worldwide. Over 876,000 people experienced homelessness (PEH) in the U.S. in 2025. Social bias is a significant barrier to alleviation, shaping public perception and influencing policymaking. Given that online textual media and offline city council discourse reflect and influence part of public opinion, it provides valuable insights to identify and track social biases against PEH. We present a new, manually-annotated multi-domain dataset compiled from Reddit, X (formerly Twitter), news articles, and city council meeting minutes across ten U.S. cities. Our 16-category multi-label taxonomy creates a challenging long-tail classification problem: some categories appear in less than 1% of samples, while others exceed 70%. We find that small human-annotated datasets (1,702 samples) are insufficient for training effective classifiers, whether used to fine-tune encoder models or as few-shot examples for LLMs. To address this, we use GPT-4.1 to generate pseudo-labels on a larger unlabeled corpus. Training on this expanded dataset enables even small encoder models (ModernBERT, 150M parameters) to achieve 35.23 macro-F1, approaching GPT-4.1's 41.57. This demonstrates that \textbf{data quantity matters more than model size}, enabling low-cost, privacy-preserving deployment without relying on commercial APIs. Our results reveal that negative bias against PEH is prevalent both offline and online (especially on Reddit), with "not in my backyard" narratives showing the highest engagement. These findings uncover a type of ostracism that directly impacts poverty-reduction policymaking and provide actionable insights for practitioners addressing homelessness.
