Automatic Fact-checking in English and Telugu
Ravi Kiran Chikkala, Tatiana Anikina, Natalia Skachkova, Ivan Vykopal, Rodrigo Agerri, Josef van Genabith
TL;DR
This work introduces Preethi, a bilingual English–Telugu fact-checking dataset derived from IFND, and benchmarks LLM-based veracity classification and justification generation using Simple Prompting and Retrieval-Augmented Generation (RAG) pipelines. It evaluates English and Telugu claims with multiple metrics across several retrieval strategies, showing English generally benefits from richer pretraining and context, while Automatic Scraping enhances performance across languages. The study provides open resources, including gold QA pairs, justifications, and code, and analyzes errors such as biases, hallucinations, retrieval failures, and translation issues. Findings indicate RAG-based methods improve veracity classification, but justification quality in Telugu remains more challenging, signaling a need for more native Telugu data and human evaluation in future work.
Abstract
False information poses a significant global challenge, and manually verifying claims is a time-consuming and resource-intensive process. In this research paper, we experiment with different approaches to investigate the effectiveness of large language models (LLMs) in classifying factual claims by their veracity and generating justifications in English and Telugu. The key contributions of this work include the creation of a bilingual English-Telugu dataset and the benchmarking of different veracity classification approaches based on LLMs.
