Table of Contents
Fetching ...

Word2winners at SemEval-2025 Task 7: Multilingual and Crosslingual Fact-Checked Claim Retrieval

Amirmohammad Azadi, Sina Zamani, Mohammadmostafa Rostamkhani, Sauleh Eetemadi

TL;DR

The paper presents Word2winners for SemEval-2025 Task 7, tackling previously fact-checked claim retrieval across a multilingual spectrum. It combines data preprocessing, content summarization, zero-shot evaluation of multilingual encoders, and subsequent fine-tuning with a contrastive loss, all followed by a majority voting ensemble and machine translation to boost crosslingual alignment. The approach achieves strong performance, with the best model reaching about 85% crosslingual and 92% monolingual accuracy, and demonstrates that fine-tuning and ensembling are especially beneficial for crosslingual retrieval. The work highlights practical implications for mitigating misinformation across languages and points to future work in handling informal language, improving preprocessing, and expanding language resources to enhance robustness.

Abstract

This paper describes our system for SemEval 2025 Task 7: Previously Fact-Checked Claim Retrieval. The task requires retrieving relevant fact-checks for a given input claim from the extensive, multilingual MultiClaim dataset, which comprises social media posts and fact-checks in several languages. To address this challenge, we first evaluated zero-shot performance using state-of-the-art English and multilingual retrieval models and then fine-tuned the most promising systems, leveraging machine translation to enhance crosslingual retrieval. Our best model achieved an accuracy of 85% on crosslingual data and 92% on monolingual data.

Word2winners at SemEval-2025 Task 7: Multilingual and Crosslingual Fact-Checked Claim Retrieval

TL;DR

The paper presents Word2winners for SemEval-2025 Task 7, tackling previously fact-checked claim retrieval across a multilingual spectrum. It combines data preprocessing, content summarization, zero-shot evaluation of multilingual encoders, and subsequent fine-tuning with a contrastive loss, all followed by a majority voting ensemble and machine translation to boost crosslingual alignment. The approach achieves strong performance, with the best model reaching about 85% crosslingual and 92% monolingual accuracy, and demonstrates that fine-tuning and ensembling are especially beneficial for crosslingual retrieval. The work highlights practical implications for mitigating misinformation across languages and points to future work in handling informal language, improving preprocessing, and expanding language resources to enhance robustness.

Abstract

This paper describes our system for SemEval 2025 Task 7: Previously Fact-Checked Claim Retrieval. The task requires retrieving relevant fact-checks for a given input claim from the extensive, multilingual MultiClaim dataset, which comprises social media posts and fact-checks in several languages. To address this challenge, we first evaluated zero-shot performance using state-of-the-art English and multilingual retrieval models and then fine-tuned the most promising systems, leveraging machine translation to enhance crosslingual retrieval. Our best model achieved an accuracy of 85% on crosslingual data and 92% on monolingual data.

Paper Structure

This paper contains 11 sections, 1 equation, 3 figures, 7 tables.

Figures (3)

  • Figure 1: Types of information retrieval including monolingual, multilingual, and crosslingual retrieval illustrated by PANCHENDRARAJAN2024100066
  • Figure 2: Zero-shot vs Fine-Tuning Performance (S@10)
  • Figure 3: The best model's performance (S@10) on different languages