Table of Contents
Fetching ...

Decomposition Dilemmas: Does Claim Decomposition Boost or Burden Fact-Checking Performance?

Qisheng Hu, Quanyu Long, Wenya Wang

TL;DR

The paper investigates how the Decompose-Then-Verify pipeline affects downstream fact-checking performance, addressing why decomposition sometimes helps and other times hurts. It offers a rigorous error taxonomy for decomposition, and formalizes a trade-off model showing that gains from simpler sub-claims can be offset by retrieval and decomposition noise. Through experiments across claim- and response-level data and multiple verifiers, the study shows that decomposition benefits weaker verifiers and more complex inputs, but can degrade performance for stronger verifiers or simpler inputs. The findings guide future work toward designing decomposition strategies that adapt to input complexity and verifier strength, with broader implications for robust, scalable automated fact-checking systems.

Abstract

Fact-checking pipelines increasingly adopt the Decompose-Then-Verify paradigm, where texts are broken down into smaller claims for individual verification and subsequently combined for a veracity decision. While decomposition is widely-adopted in such pipelines, its effects on final fact-checking performance remain underexplored. Some studies have reported improvements from decompostition, while others have observed performance declines, indicating its inconsistent impact. To date, no comprehensive analysis has been conducted to understand this variability. To address this gap, we present an in-depth analysis that explicitly examines the impact of decomposition on downstream verification performance. Through error case inspection and experiments, we introduce a categorization of decomposition errors and reveal a trade-off between accuracy gains and the noise introduced through decomposition. Our analysis provides new insights into understanding current system's instability and offers guidance for future studies toward improving claim decomposition in fact-checking pipelines.

Decomposition Dilemmas: Does Claim Decomposition Boost or Burden Fact-Checking Performance?

TL;DR

The paper investigates how the Decompose-Then-Verify pipeline affects downstream fact-checking performance, addressing why decomposition sometimes helps and other times hurts. It offers a rigorous error taxonomy for decomposition, and formalizes a trade-off model showing that gains from simpler sub-claims can be offset by retrieval and decomposition noise. Through experiments across claim- and response-level data and multiple verifiers, the study shows that decomposition benefits weaker verifiers and more complex inputs, but can degrade performance for stronger verifiers or simpler inputs. The findings guide future work toward designing decomposition strategies that adapt to input complexity and verifier strength, with broader implications for robust, scalable automated fact-checking systems.

Abstract

Fact-checking pipelines increasingly adopt the Decompose-Then-Verify paradigm, where texts are broken down into smaller claims for individual verification and subsequently combined for a veracity decision. While decomposition is widely-adopted in such pipelines, its effects on final fact-checking performance remain underexplored. Some studies have reported improvements from decompostition, while others have observed performance declines, indicating its inconsistent impact. To date, no comprehensive analysis has been conducted to understand this variability. To address this gap, we present an in-depth analysis that explicitly examines the impact of decomposition on downstream verification performance. Through error case inspection and experiments, we introduce a categorization of decomposition errors and reveal a trade-off between accuracy gains and the noise introduced through decomposition. Our analysis provides new insights into understanding current system's instability and offers guidance for future studies toward improving claim decomposition in fact-checking pipelines.

Paper Structure

This paper contains 67 sections, 3 equations, 3 figures, 15 tables.

Figures (3)

  • Figure 1: An overview of the Decompose-Then-Verify pipeline employed in this study, which comprises four key stages: decomposition, retrieval, verification, and aggregation of sub-claim results. This figure illustrates how different decomposition methods, such as FactScore min-etal-2023-factscore and VeriScore song2024veriscore, can lead to divergent decomposing outcomes. In this example, FactScore generates ambiguous sub-claims, while VeriScore omits key information (e.g., "Ultimately, the success of the Su-57...") from the input.
  • Figure 2: Error distribution of using FactScore/VeriScore decomposition on claim-level(WICE, CLAIMDECOMP) and response-level(FELM, BINGCHAT) datasets.
  • Figure 3: The heatmap illustrates F1-scores across varying decomposed sub-claim counts and input complexity levels. Performance generally improves initially as the number of decomposed sub-claims increases, followed by a decline. We find that ensuring the number of sub-claims remains below the input complexity level helps sustain the positive effects of decomposition.