Table of Contents
Fetching ...

COMMUNITYNOTES: A Dataset for Exploring the Helpfulness of Fact-Checking Explanations

Rui Xing, Preslav Nakov, Timothy Baldwin, Jey Han Lau

TL;DR

We introduce COMMUNITYNOTES, a large-scale multilingual dataset of 104k posts with community-provided explanatory notes and helpfulness labels, capturing a crucial shift toward crowd-sourced fact-checking. We formalize the task of predicting both the helpfulness of notes and the reasons behind that judgment, and we propose an automatic prompt-optimization framework to generate and refine reason definitions, integrating them into predictive models. Empirical results show that optimized reason definitions substantially improve reason-prediction and can enhance helpfulness prediction, with benefits transferring to evidence-sufficiency tasks and real-world fact-checking on the Climate-FEVER dataset. The work demonstrates the potential of explicit, machine-interpretable reason definitions to boost the explainability and effectiveness of crowd-based fact-checking systems, offering a scalable path toward more transparent misinformation mitigation.

Abstract

Fact-checking on major platforms, such as X, Meta, and TikTok, is shifting from expert-driven verification to a community-based setup, where users contribute explanatory notes to clarify why a post might be misleading. An important challenge here is determining whether an explanation is helpful for understanding real-world claims and the reasons why, which remains largely underexplored in prior research. In practice, most community notes remain unpublished due to slow community annotation, and the reasons for helpfulness lack clear definitions. To bridge these gaps, we introduce the task of predicting both the helpfulness of explanatory notes and the reason for this. We present COMMUNITYNOTES, a large-scale multilingual dataset of 104k posts with user-provided notes and helpfulness labels. We further propose a framework that automatically generates and improves reason definitions via automatic prompt optimization, and integrate them into prediction. Our experiments show that the optimized definitions can improve both helpfulness and reason prediction. Finally, we show that the helpfulness information are beneficial for existing fact-checking systems.

COMMUNITYNOTES: A Dataset for Exploring the Helpfulness of Fact-Checking Explanations

TL;DR

We introduce COMMUNITYNOTES, a large-scale multilingual dataset of 104k posts with community-provided explanatory notes and helpfulness labels, capturing a crucial shift toward crowd-sourced fact-checking. We formalize the task of predicting both the helpfulness of notes and the reasons behind that judgment, and we propose an automatic prompt-optimization framework to generate and refine reason definitions, integrating them into predictive models. Empirical results show that optimized reason definitions substantially improve reason-prediction and can enhance helpfulness prediction, with benefits transferring to evidence-sufficiency tasks and real-world fact-checking on the Climate-FEVER dataset. The work demonstrates the potential of explicit, machine-interpretable reason definitions to boost the explainability and effectiveness of crowd-based fact-checking systems, offering a scalable path toward more transparent misinformation mitigation.

Abstract

Fact-checking on major platforms, such as X, Meta, and TikTok, is shifting from expert-driven verification to a community-based setup, where users contribute explanatory notes to clarify why a post might be misleading. An important challenge here is determining whether an explanation is helpful for understanding real-world claims and the reasons why, which remains largely underexplored in prior research. In practice, most community notes remain unpublished due to slow community annotation, and the reasons for helpfulness lack clear definitions. To bridge these gaps, we introduce the task of predicting both the helpfulness of explanatory notes and the reason for this. We present COMMUNITYNOTES, a large-scale multilingual dataset of 104k posts with user-provided notes and helpfulness labels. We further propose a framework that automatically generates and improves reason definitions via automatic prompt optimization, and integrate them into prediction. Our experiments show that the optimized definitions can improve both helpfulness and reason prediction. Finally, we show that the helpfulness information are beneficial for existing fact-checking systems.

Paper Structure

This paper contains 34 sections, 5 figures, 12 tables.

Figures (5)

  • Figure 1: An example of X Community Notes. The user-generated note appears as "Readers added context."
  • Figure 2: The languages in the COMMUNITYNOTES: fr-French, es-Spanish, ja-Japanese, pt-Portuguese, de-German, Other: note languages that appear less than 1,000 times.
  • Figure 3: Our framework for automatic reason definition generation and optimization.
  • Figure 4: A historgram of the Note Reason label frequencies. COMMUNITYNOTES contains 18 reason categories—8 corresponding to helpful notes and 10 to not helpful notes.
  • Figure 5: Boxplot of tweet reactions in COMMUNITYNOTES. We use three metrics for reactions, numebr of likes, reply counts and retweets.