FactFinders at CheckThat! 2024: Refining Check-worthy Statement Detection with LLMs through Data Pruning

Yufeng Li; Rrubaa Panchendrarajan; Arkaitz Zubiaga

FactFinders at CheckThat! 2024: Refining Check-worthy Statement Detection with LLMs through Data Pruning

Yufeng Li, Rrubaa Panchendrarajan, Arkaitz Zubiaga

TL;DR

This study investigates the application of eight prominent open-source LLMs with fine-tuning and prompt engineering to identify check-worthy statements from political transcriptions and proposes a two-step data pruning approach to automatically identify high-quality training data instances for effective learning.

Abstract

The rapid dissemination of information through social media and the Internet has posed a significant challenge for fact-checking, among others in identifying check-worthy claims that fact-checkers should pay attention to, i.e. filtering claims needing fact-checking from a large pool of sentences. This challenge has stressed the need to focus on determining the priority of claims, specifically which claims are worth to be fact-checked. Despite advancements in this area in recent years, the application of large language models (LLMs), such as GPT, has only recently drawn attention in studies. However, many open-source LLMs remain underexplored. Therefore, this study investigates the application of eight prominent open-source LLMs with fine-tuning and prompt engineering to identify check-worthy statements from political transcriptions. Further, we propose a two-step data pruning approach to automatically identify high-quality training data instances for effective learning. The efficiency of our approach is demonstrated through evaluations on the English language dataset as part of the check-worthiness estimation task of CheckThat! 2024. Further, the experiments conducted with data pruning demonstrate that competitive performance can be achieved with only about 44\% of the training data. Our team ranked first in the check-worthiness estimation task in the English language.

FactFinders at CheckThat! 2024: Refining Check-worthy Statement Detection with LLMs through Data Pruning

TL;DR

Abstract

Paper Structure (18 sections, 6 figures, 9 tables)

This paper contains 18 sections, 6 figures, 9 tables.

Introduction
Methodology
Dataset
LLMs for Check-worthy Statement Detection
Large Language Models
Prompt Engineering
Effective Fine-tuning
Data Pruning for Effective Learning
Step 1 - Identifying Informative Sentences
Step 2 - Under Sampling using Condensed Nearest Neighbour
Results
Hyper-parameters and Environment Setting
Evaluation Metrics
Comparison of LLMs
Consistency Analysis
...and 3 more sections

Figures (6)

Figure 1: Distribution of Text Length in each Partition
Figure 2: Word cloud indicating verb types and their frequencies in the training data.
Figure 3: Distribution of verb types in the training data
Figure 4: 2D visualization of the Training Data.
Figure 5: Consistency of Llama2-7b with Iterations
...and 1 more figures

FactFinders at CheckThat! 2024: Refining Check-worthy Statement Detection with LLMs through Data Pruning

TL;DR

Abstract

FactFinders at CheckThat! 2024: Refining Check-worthy Statement Detection with LLMs through Data Pruning

Authors

TL;DR

Abstract

Table of Contents

Figures (6)