Word2winners at SemEval-2025 Task 7: Multilingual and Crosslingual Fact-Checked Claim Retrieval
Amirmohammad Azadi, Sina Zamani, Mohammadmostafa Rostamkhani, Sauleh Eetemadi
TL;DR
The paper presents Word2winners for SemEval-2025 Task 7, tackling previously fact-checked claim retrieval across a multilingual spectrum. It combines data preprocessing, content summarization, zero-shot evaluation of multilingual encoders, and subsequent fine-tuning with a contrastive loss, all followed by a majority voting ensemble and machine translation to boost crosslingual alignment. The approach achieves strong performance, with the best model reaching about 85% crosslingual and 92% monolingual accuracy, and demonstrates that fine-tuning and ensembling are especially beneficial for crosslingual retrieval. The work highlights practical implications for mitigating misinformation across languages and points to future work in handling informal language, improving preprocessing, and expanding language resources to enhance robustness.
Abstract
This paper describes our system for SemEval 2025 Task 7: Previously Fact-Checked Claim Retrieval. The task requires retrieving relevant fact-checks for a given input claim from the extensive, multilingual MultiClaim dataset, which comprises social media posts and fact-checks in several languages. To address this challenge, we first evaluated zero-shot performance using state-of-the-art English and multilingual retrieval models and then fine-tuned the most promising systems, leveraging machine translation to enhance crosslingual retrieval. Our best model achieved an accuracy of 85% on crosslingual data and 92% on monolingual data.
