Scalable Fact-checking with Human-in-the-Loop
Jing Yang, Didier Vega-Oliveros, Tais Seibt, Anderson Rocha
TL;DR
This paper tackles the scalability problem in fact-checking by proposing an unsupervised, two-stage pipeline that first clusters semantically similar social-media posts and then summarizes each cluster into a representative claim, enabling human fact-checkers to operate more efficiently. It evaluates two clustering methods (Agglomerative and Leiden) and four summarization strategies (extractive DG/MCI and abstractive BART/T5), complemented by a graph-based analysis and human-in-the-loop scoring. On MM-COVID data, the approach reduces 28,818 tweets to 700 summary claims while achieving strong clustering and summarization performance, with extractive methods and Leiden clustering generally performing best under ROUGE and human metrics. The work demonstrates a practical pathway to accelerate fact-checking at scale, while identifying key challenges such as ground-truth labeling for clustering, full similarity-matrix computations, and ensuring summaries remain claim-focused for real-world deployment.
Abstract
Researchers have been investigating automated solutions for fact-checking in a variety of fronts. However, current approaches often overlook the fact that the amount of information released every day is escalating, and a large amount of them overlap. Intending to accelerate fact-checking, we bridge this gap by grouping similar messages and summarizing them into aggregated claims. Specifically, we first clean a set of social media posts (e.g., tweets) and build a graph of all posts based on their semantics; Then, we perform two clustering methods to group the messages for further claim summarization. We evaluate the summaries both quantitatively with ROUGE scores and qualitatively with human evaluation. We also generate a graph of summaries to verify that there is no significant overlap among them. The results reduced 28,818 original messages to 700 summary claims, showing the potential to speed up the fact-checking process by organizing and selecting representative claims from massive disorganized and redundant messages.
