Table of Contents
Fetching ...

HYBRINFOX at CheckThat! 2024 -- Task 1: Enhancing Language Models with Structured Information for Check-Worthiness Estimation

Géraud Faye, Morgane Casanova, Benjamin Icard, Julien Chanson, Guillaume Gadek, Guillaume Gravier, Paul Égré

TL;DR

The paper addresses check-worthiness estimation for CheckThat! 2024 Task 1 by augmenting RoBERTa embeddings with structured information in the form of subject–predicate–object triples extracted via OpenIE6. The proposed hybrid neural architecture processes both LM embeddings ($768$-dim) and triple embeddings (each triple encoded to $768$) and merges them for final classification, enabling improved detection of check-worthy claims. On English data, the approach achieves an F1 of $71.1$ with a rank of $12/27$, while Dutch and Arabic show mixed results ($58.9$ and $51.9$, respectively), highlighting language-dependent gains and limitations in multilingual settings. The work demonstrates that incorporating structured information from text can enhance check-worthiness estimation beyond plain LM embeddings and points to future work with newer LLMs, coreference filtering, and interpretability to further boost cross-language performance in misinformation contexts.

Abstract

This paper summarizes the experiments and results of the HYBRINFOX team for the CheckThat! 2024 - Task 1 competition. We propose an approach enriching Language Models such as RoBERTa with embeddings produced by triples (subject ; predicate ; object) extracted from the text sentences. Our analysis of the developmental data shows that this method improves the performance of Language Models alone. On the evaluation data, its best performance was in English, where it achieved an F1 score of 71.1 and ranked 12th out of 27 candidates. On the other languages (Dutch and Arabic), it obtained more mixed results. Future research tracks are identified toward adapting this processing pipeline to more recent Large Language Models.

HYBRINFOX at CheckThat! 2024 -- Task 1: Enhancing Language Models with Structured Information for Check-Worthiness Estimation

TL;DR

The paper addresses check-worthiness estimation for CheckThat! 2024 Task 1 by augmenting RoBERTa embeddings with structured information in the form of subject–predicate–object triples extracted via OpenIE6. The proposed hybrid neural architecture processes both LM embeddings (-dim) and triple embeddings (each triple encoded to ) and merges them for final classification, enabling improved detection of check-worthy claims. On English data, the approach achieves an F1 of with a rank of , while Dutch and Arabic show mixed results ( and , respectively), highlighting language-dependent gains and limitations in multilingual settings. The work demonstrates that incorporating structured information from text can enhance check-worthiness estimation beyond plain LM embeddings and points to future work with newer LLMs, coreference filtering, and interpretability to further boost cross-language performance in misinformation contexts.

Abstract

This paper summarizes the experiments and results of the HYBRINFOX team for the CheckThat! 2024 - Task 1 competition. We propose an approach enriching Language Models such as RoBERTa with embeddings produced by triples (subject ; predicate ; object) extracted from the text sentences. Our analysis of the developmental data shows that this method improves the performance of Language Models alone. On the evaluation data, its best performance was in English, where it achieved an F1 score of 71.1 and ranked 12th out of 27 candidates. On the other languages (Dutch and Arabic), it obtained more mixed results. Future research tracks are identified toward adapting this processing pipeline to more recent Large Language Models.
Paper Structure (11 sections, 3 equations, 1 figure, 2 tables)

This paper contains 11 sections, 3 equations, 1 figure, 2 tables.

Figures (1)

  • Figure 1: Our proposed architecture: adding structured information extracted from the text to enhance the LM embeddings. RoBERTa and OpenIE6 can be switched with other models for non-English languages.