Table of Contents
Fetching ...

GitSearch: Enhancing Community Notes Generation with Gap-Informed Targeted Search

Sahajpreet Singh, Kokil Jaidka, Min-Yen Kan

TL;DR

The paper tackles moderation of online misinformation by addressing gaps in Community Notes through a gap-informed framework called GitSearch. It proposes a three-stage pipeline—gap detection, targeted retrieval, and constrained synthesis—to align AI-generated CNs with human expectations, especially in cold-start scenarios. Using PolBench, it demonstrates near-full coverage ($99\%$) and strong quality, surpassing both human-authored notes in helpfulness (mean $3.87$ vs $3.36$) and generic web-agent baselines in structured retrieval (win rate $59\%$). The work also shows that incorporating existing notes as context improves performance and highlights remaining challenges in ambiguity resolution and context reconstruction. Overall, GitSearch offers a scalable, evidence-grounded approach to improve the reliability and usefulness of community-based moderation while acknowledging the need for human oversight and iterative refinement.

Abstract

Community-based moderation offers a scalable alternative to centralized fact-checking, yet it faces significant structural challenges, and existing AI-based methods fail in "cold start" scenarios. To tackle these challenges, we introduce GitSearch (Gap-Informed Targeted Search), a framework that treats human-perceived quality gaps, such as missing context, etc., as first-class signals. GitSearch has a three-stage pipeline: identifying information deficits, executing real-time targeted web-retrieval to resolve them, and synthesizing platform-compliant notes. To facilitate evaluation, we present PolBench, a benchmark of 78,698 U.S. political tweets with their associated Community Notes. We find GitSearch achieves 99% coverage, almost doubling coverage over the state-of-the-art. GitSearch surpasses human-authored helpful notes with a 69% win rate and superior helpfulness scores (3.87 vs. 3.36), demonstrating retrieval effectiveness that balanced the trade-off between scale and quality.

GitSearch: Enhancing Community Notes Generation with Gap-Informed Targeted Search

TL;DR

The paper tackles moderation of online misinformation by addressing gaps in Community Notes through a gap-informed framework called GitSearch. It proposes a three-stage pipeline—gap detection, targeted retrieval, and constrained synthesis—to align AI-generated CNs with human expectations, especially in cold-start scenarios. Using PolBench, it demonstrates near-full coverage () and strong quality, surpassing both human-authored notes in helpfulness (mean vs ) and generic web-agent baselines in structured retrieval (win rate ). The work also shows that incorporating existing notes as context improves performance and highlights remaining challenges in ambiguity resolution and context reconstruction. Overall, GitSearch offers a scalable, evidence-grounded approach to improve the reliability and usefulness of community-based moderation while acknowledging the need for human oversight and iterative refinement.

Abstract

Community-based moderation offers a scalable alternative to centralized fact-checking, yet it faces significant structural challenges, and existing AI-based methods fail in "cold start" scenarios. To tackle these challenges, we introduce GitSearch (Gap-Informed Targeted Search), a framework that treats human-perceived quality gaps, such as missing context, etc., as first-class signals. GitSearch has a three-stage pipeline: identifying information deficits, executing real-time targeted web-retrieval to resolve them, and synthesizing platform-compliant notes. To facilitate evaluation, we present PolBench, a benchmark of 78,698 U.S. political tweets with their associated Community Notes. We find GitSearch achieves 99% coverage, almost doubling coverage over the state-of-the-art. GitSearch surpasses human-authored helpful notes with a 69% win rate and superior helpfulness scores (3.87 vs. 3.36), demonstrating retrieval effectiveness that balanced the trade-off between scale and quality.
Paper Structure (39 sections, 11 figures, 7 tables)

This paper contains 39 sections, 11 figures, 7 tables.

Figures (11)

  • Figure 1: Example comparison of human-written and GitSearch-generated Community Notes addressing misleading tweet.
  • Figure 2: Overview of our three-phase (in black numbering) Gap-Informed Targeted Search (GitSearch) framework.
  • Figure 3: LLM-as-a-Judge scores for GitSearch-generated notes by present gap types. Abbreviations: CON: contradiction, MIS_CON: missing context, MIS_COV: missing coverage, SOU_VER: source verification, UNS_CLA: unsubstantiated claim, and VAG_REF: vague reference.
  • Figure 4: Pairwise win rates in human evaluation. Each cell shows the proportion of times the row was preferred over the column.
  • Figure 5: Prompt template for the Supernote-lite baseline. The model is instructed to synthesize a single "Super Community Note" by aggregating key points from existing notes, weighted by their helpfulness scores, without performing new external retrieval.
  • ...and 6 more figures