Cross-Lingual Learning vs. Low-Resource Fine-Tuning: A Case Study with Fact-Checking in Turkish
Recep Firat Cekinel, Pinar Karagoz, Cagri Coltekin
TL;DR
This paper investigates cross-lingual learning for fact-checking in Turkish by introducing the Turkish Fact-Checking and Claims Repository (FCTR) with 3238 claims sourced from three Turkish organizations. It evaluates zero-shot, few-shot, and fine-tuning strategies using large language models (notably LLaMA-2 with LoRA/QLoRA) and traditional baselines, and it also examines neural machine translation as a bridge from Turkish to English. Key findings show that native Turkish fine-tuning yields substantial gains, while cross-lingual prompting offers modest improvements and translation effects are mixed, underscoring the ongoing importance of collecting native data for low-resource languages. The work provides a valuable dataset and a comprehensive set of experiments that advance Turkish NLP for misinformation detection and offer a framework for cross-lingual fact-checking research.
Abstract
The rapid spread of misinformation through social media platforms has raised concerns regarding its impact on public opinion. While misinformation is prevalent in other languages, the majority of research in this field has concentrated on the English language. Hence, there is a scarcity of datasets for other languages, including Turkish. To address this concern, we have introduced the FCTR dataset, consisting of 3238 real-world claims. This dataset spans multiple domains and incorporates evidence collected from three Turkish fact-checking organizations. Additionally, we aim to assess the effectiveness of cross-lingual transfer learning for low-resource languages, with a particular focus on Turkish. We demonstrate in-context learning (zero-shot and few-shot) performance of large language models in this context. The experimental results indicate that the dataset has the potential to advance research in the Turkish language.
