Table of Contents
Fetching ...

From Chaos to Clarity: Claim Normalization to Empower Fact-Checking

Megha Sundriyal, Tanmoy Chakraborty, Preslav Nakov

TL;DR

This work defines Claim Normalization (ClaimNorm) to distill a social media post's central, verifiable assertion, addressing the gap between noisy content and fact-checking needs. It introduces CACN, a chain-of-thought and check-worthiness aware framework that leverages in-context learning with large language models to generate concise normalized claims, and it introduces CLAN, a real-world dataset of 6,388 post–normalized-claim pairs sourced from the Google Fact-Check Explorer and ClaimReview Schema. Experiments show CACN outperforms strong baselines across lexical and semantic metrics, with prompt tuning and in-context learning delivering substantial gains, while zero-shot performance demonstrates notable inherent capabilities. The work discusses limitations, data biases, and environmental considerations, and outlines future directions including multilingual and multimodal extensions to broaden impact in automated fact-checking pipelines.

Abstract

With the rise of social media, users are exposed to many misleading claims. However, the pervasive noise inherent in these posts presents a challenge in identifying precise and prominent claims that require verification. Extracting the important claims from such posts is arduous and time-consuming, yet it is an underexplored problem. Here, we aim to bridge this gap. We introduce a novel task, Claim Normalization (aka ClaimNorm), which aims to decompose complex and noisy social media posts into more straightforward and understandable forms, termed normalized claims. We propose CACN, a pioneering approach that leverages chain-of-thought and claim check-worthiness estimation, mimicking human reasoning processes, to comprehend intricate claims. Moreover, we capitalize on the in-context learning capabilities of large language models to provide guidance and to improve claim normalization. To evaluate the effectiveness of our proposed model, we meticulously compile a comprehensive real-world dataset, CLAN, comprising more than 6k instances of social media posts alongside their respective normalized claims. Our experiments demonstrate that CACN outperforms several baselines across various evaluation measures. Finally, our rigorous error analysis validates CACN's capabilities and pitfalls.

From Chaos to Clarity: Claim Normalization to Empower Fact-Checking

TL;DR

This work defines Claim Normalization (ClaimNorm) to distill a social media post's central, verifiable assertion, addressing the gap between noisy content and fact-checking needs. It introduces CACN, a chain-of-thought and check-worthiness aware framework that leverages in-context learning with large language models to generate concise normalized claims, and it introduces CLAN, a real-world dataset of 6,388 post–normalized-claim pairs sourced from the Google Fact-Check Explorer and ClaimReview Schema. Experiments show CACN outperforms strong baselines across lexical and semantic metrics, with prompt tuning and in-context learning delivering substantial gains, while zero-shot performance demonstrates notable inherent capabilities. The work discusses limitations, data biases, and environmental considerations, and outlines future directions including multilingual and multimodal extensions to broaden impact in automated fact-checking pipelines.

Abstract

With the rise of social media, users are exposed to many misleading claims. However, the pervasive noise inherent in these posts presents a challenge in identifying precise and prominent claims that require verification. Extracting the important claims from such posts is arduous and time-consuming, yet it is an underexplored problem. Here, we aim to bridge this gap. We introduce a novel task, Claim Normalization (aka ClaimNorm), which aims to decompose complex and noisy social media posts into more straightforward and understandable forms, termed normalized claims. We propose CACN, a pioneering approach that leverages chain-of-thought and claim check-worthiness estimation, mimicking human reasoning processes, to comprehend intricate claims. Moreover, we capitalize on the in-context learning capabilities of large language models to provide guidance and to improve claim normalization. To evaluate the effectiveness of our proposed model, we meticulously compile a comprehensive real-world dataset, CLAN, comprising more than 6k instances of social media posts alongside their respective normalized claims. Our experiments demonstrate that CACN outperforms several baselines across various evaluation measures. Finally, our rigorous error analysis validates CACN's capabilities and pitfalls.
Paper Structure (38 sections, 1 equation, 5 figures, 9 tables)

This paper contains 38 sections, 1 equation, 5 figures, 9 tables.

Figures (5)

  • Figure 1: Illustration of our proposed Claim Normalization task, highlighting the normalized claims authored by fact-checkers for social media posts from distinct social media platforms.
  • Figure 2: Histogram of the cosine similarity between the social media posts and the corresponding normalized claims from our CLAN dataset.
  • Figure 3: Illustration of our proposed approach. To generate a normalized claim, we use the CACN prompt template, which encompasses explicit task instruction and relevant in-context examples, as well as chain-of-thought reasoning.
  • Figure 4: Box-plot for the number of tokens in normalized claims in CLAN.
  • Figure 5: Our templates for in-context learning prompts used for GPT-3 (text-davinci-003).