Table of Contents
Fetching ...

ICPR 2024 Competition on Multilingual Claim-Span Identification

Soham Poddar, Biswajit Paul, Moumita Basu, Saptarshi Ghosh

TL;DR

The paper presents the ICPR 2024 Competition on Multilingual Claim-Span Identification and introduces the HECSI dataset for English and Hindi claim spans, highlighting its multilingual extension and practical focus on explainable claim verification. It documents three competition tracks, token-level evaluation with Macro-F1 and Jaccard metrics, and the participating teams and their approaches, including transformer-based fine-tuning and CAI-style enhancements. Results show strong performance by fine-tuned multilingual models in constrained tracks, yet the organizer baseline remains competitive in the unconstrained multilingual track, underscoring ongoing challenges in cross-lingual transfer and domain augmentation. By publicly releasing HECSI, the work provides a benchmark for future research in multilingual claim-span identification and its role in preventing misinformation on social media.

Abstract

A lot of claims are made in social media posts, which may contain misinformation or fake news. Hence, it is crucial to identify claims as a first step towards claim verification. Given the huge number of social media posts, the task of identifying claims needs to be automated. This competition deals with the task of 'Claim Span Identification' in which, given a text, parts / spans that correspond to claims are to be identified. This task is more challenging than the traditional binary classification of text into claim or not-claim, and requires state-of-the-art methods in Pattern Recognition, Natural Language Processing and Machine Learning. For this competition, we used a newly developed dataset called HECSI containing about 8K posts in English and about 8K posts in Hindi with claim-spans marked by human annotators. This paper gives an overview of the competition, and the solutions developed by the participating teams.

ICPR 2024 Competition on Multilingual Claim-Span Identification

TL;DR

The paper presents the ICPR 2024 Competition on Multilingual Claim-Span Identification and introduces the HECSI dataset for English and Hindi claim spans, highlighting its multilingual extension and practical focus on explainable claim verification. It documents three competition tracks, token-level evaluation with Macro-F1 and Jaccard metrics, and the participating teams and their approaches, including transformer-based fine-tuning and CAI-style enhancements. Results show strong performance by fine-tuned multilingual models in constrained tracks, yet the organizer baseline remains competitive in the unconstrained multilingual track, underscoring ongoing challenges in cross-lingual transfer and domain augmentation. By publicly releasing HECSI, the work provides a benchmark for future research in multilingual claim-span identification and its role in preventing misinformation on social media.

Abstract

A lot of claims are made in social media posts, which may contain misinformation or fake news. Hence, it is crucial to identify claims as a first step towards claim verification. Given the huge number of social media posts, the task of identifying claims needs to be automated. This competition deals with the task of 'Claim Span Identification' in which, given a text, parts / spans that correspond to claims are to be identified. This task is more challenging than the traditional binary classification of text into claim or not-claim, and requires state-of-the-art methods in Pattern Recognition, Natural Language Processing and Machine Learning. For this competition, we used a newly developed dataset called HECSI containing about 8K posts in English and about 8K posts in Hindi with claim-spans marked by human annotators. This paper gives an overview of the competition, and the solutions developed by the participating teams.

Paper Structure

This paper contains 7 sections, 2 equations, 5 tables.