Table of Contents
Fetching ...

A Comparative Study of Translation Bias and Accuracy in Multilingual Large Language Models for Cross-Language Claim Verification

Aryan Singhal, Veronica Shao, Gary Sun, Ryan Ding, Jonathan Lu, Kevin Zhu

TL;DR

The findings reveal that low-resource languages exhibit significantly lower accuracy in direct inference due to underrepresentation in the training data and the need for balanced multilingual training, especially in low-resource languages, to promote equitable access to reliable fact-checking tools.

Abstract

The rise of digital misinformation has heightened interest in using multilingual Large Language Models (LLMs) for fact-checking. This study systematically evaluates translation bias and the effectiveness of LLMs for cross-lingual claim verification across 15 languages from five language families: Romance, Slavic, Turkic, Indo-Aryan, and Kartvelian. Using the XFACT dataset to assess their impact on accuracy and bias, we investigate two distinct translation methods: pre-translation and self-translation. We use mBERT's performance on the English dataset as a baseline to compare language-specific accuracies. Our findings reveal that low-resource languages exhibit significantly lower accuracy in direct inference due to underrepresentation in the training data. Furthermore, larger models demonstrate superior performance in self-translation, improving translation accuracy and reducing bias. These results highlight the need for balanced multilingual training, especially in low-resource languages, to promote equitable access to reliable fact-checking tools and minimize the risk of spreading misinformation in different linguistic contexts.

A Comparative Study of Translation Bias and Accuracy in Multilingual Large Language Models for Cross-Language Claim Verification

TL;DR

The findings reveal that low-resource languages exhibit significantly lower accuracy in direct inference due to underrepresentation in the training data and the need for balanced multilingual training, especially in low-resource languages, to promote equitable access to reliable fact-checking tools.

Abstract

The rise of digital misinformation has heightened interest in using multilingual Large Language Models (LLMs) for fact-checking. This study systematically evaluates translation bias and the effectiveness of LLMs for cross-lingual claim verification across 15 languages from five language families: Romance, Slavic, Turkic, Indo-Aryan, and Kartvelian. Using the XFACT dataset to assess their impact on accuracy and bias, we investigate two distinct translation methods: pre-translation and self-translation. We use mBERT's performance on the English dataset as a baseline to compare language-specific accuracies. Our findings reveal that low-resource languages exhibit significantly lower accuracy in direct inference due to underrepresentation in the training data. Furthermore, larger models demonstrate superior performance in self-translation, improving translation accuracy and reducing bias. These results highlight the need for balanced multilingual training, especially in low-resource languages, to promote equitable access to reliable fact-checking tools and minimize the risk of spreading misinformation in different linguistic contexts.

Paper Structure

This paper contains 24 sections, 1 equation, 2 figures, 13 tables.

Figures (2)

  • Figure 1: Flowchart illustrating the process for evaluating the claim verification performance of LLMs using Direct Inference, Self-Translation, and Pre-Translation.
  • Figure 2: Accuracy performance of Llama 3.1 models across different language families using Self-Translation and Pre-Translation techniques.