GitSearch: Enhancing Community Notes Generation with Gap-Informed Targeted Search
Sahajpreet Singh, Kokil Jaidka, Min-Yen Kan
TL;DR
The paper tackles moderation of online misinformation by addressing gaps in Community Notes through a gap-informed framework called GitSearch. It proposes a three-stage pipeline—gap detection, targeted retrieval, and constrained synthesis—to align AI-generated CNs with human expectations, especially in cold-start scenarios. Using PolBench, it demonstrates near-full coverage ($99\%$) and strong quality, surpassing both human-authored notes in helpfulness (mean $3.87$ vs $3.36$) and generic web-agent baselines in structured retrieval (win rate $59\%$). The work also shows that incorporating existing notes as context improves performance and highlights remaining challenges in ambiguity resolution and context reconstruction. Overall, GitSearch offers a scalable, evidence-grounded approach to improve the reliability and usefulness of community-based moderation while acknowledging the need for human oversight and iterative refinement.
Abstract
Community-based moderation offers a scalable alternative to centralized fact-checking, yet it faces significant structural challenges, and existing AI-based methods fail in "cold start" scenarios. To tackle these challenges, we introduce GitSearch (Gap-Informed Targeted Search), a framework that treats human-perceived quality gaps, such as missing context, etc., as first-class signals. GitSearch has a three-stage pipeline: identifying information deficits, executing real-time targeted web-retrieval to resolve them, and synthesizing platform-compliant notes. To facilitate evaluation, we present PolBench, a benchmark of 78,698 U.S. political tweets with their associated Community Notes. We find GitSearch achieves 99% coverage, almost doubling coverage over the state-of-the-art. GitSearch surpasses human-authored helpful notes with a 69% win rate and superior helpfulness scores (3.87 vs. 3.36), demonstrating retrieval effectiveness that balanced the trade-off between scale and quality.
