Table of Contents
Fetching ...

A Commonsense-Infused Language-Agnostic Learning Framework for Enhancing Prediction of Political Polarity in Multilingual News Headlines

Swati Swati, Adrian Mladenić Grobelnik, Dunja Mladenić, Marko Grobelnik

TL;DR

This study tackles predicting political polarity in multilingual news headlines, focusing on low-resource European languages. It introduces a Translate-Retrieve-Translate pipeline to extract Inferential Commonsense Knowledge (IC_Knwl) using COMET on ATOMIC2020 and to translate it back to the target languages, then fuses attended IC_Knwl with multilingual PLMs for bias prediction. The authors release a 62,689-headline dataset across five low-resource languages and demonstrate that IC_Knwl, especially when combined with an attention mechanism, consistently improves performance over headline-only baselines across multiple PLMs. The work highlights both the theoretical value of bridging commonsense reasoning with cross-lingual transfer and the practical potential for journalists and researchers, while also showing that translation quality significantly influences performance in some languages like Slovenian. Future directions include expanding data sources, multitask learning, and exploring additional knowledge resources to further enhance multilingual bias detection.

Abstract

Predicting the political polarity of news headlines is a challenging task that becomes even more challenging in a multilingual setting with low-resource languages. To deal with this, we propose to utilise the Inferential Commonsense Knowledge via a Translate-Retrieve-Translate strategy to introduce a learning framework. To begin with, we use the method of translation and retrieval to acquire the inferential knowledge in the target language. We then employ an attention mechanism to emphasise important inferences. We finally integrate the attended inferences into a multilingual pre-trained language model for the task of bias prediction. To evaluate the effectiveness of our framework, we present a dataset of over 62.6K multilingual news headlines in five European languages annotated with their respective political polarities. We evaluate several state-of-the-art multilingual pre-trained language models since their performance tends to vary across languages (low/high resource). Evaluation results demonstrate that our proposed framework is effective regardless of the models employed. Overall, the best performing model trained with only headlines show 0.90 accuracy and F1, and 0.83 jaccard score. With attended knowledge in our framework, the same model show an increase in 2.2% accuracy and F1, and 3.6% jaccard score. Extending our experiments to individual languages reveals that the models we analyze for Slovenian perform significantly worse than other languages in our dataset. To investigate this, we assess the effect of translation quality on prediction performance. It indicates that the disparity in performance is most likely due to poor translation quality. We release our dataset and scripts at: https://github.com/Swati17293/KG-Multi-Bias for future research. Our framework has the potential to benefit journalists, social scientists, news producers, and consumers.

A Commonsense-Infused Language-Agnostic Learning Framework for Enhancing Prediction of Political Polarity in Multilingual News Headlines

TL;DR

This study tackles predicting political polarity in multilingual news headlines, focusing on low-resource European languages. It introduces a Translate-Retrieve-Translate pipeline to extract Inferential Commonsense Knowledge (IC_Knwl) using COMET on ATOMIC2020 and to translate it back to the target languages, then fuses attended IC_Knwl with multilingual PLMs for bias prediction. The authors release a 62,689-headline dataset across five low-resource languages and demonstrate that IC_Knwl, especially when combined with an attention mechanism, consistently improves performance over headline-only baselines across multiple PLMs. The work highlights both the theoretical value of bridging commonsense reasoning with cross-lingual transfer and the practical potential for journalists and researchers, while also showing that translation quality significantly influences performance in some languages like Slovenian. Future directions include expanding data sources, multitask learning, and exploring additional knowledge resources to further enhance multilingual bias detection.

Abstract

Predicting the political polarity of news headlines is a challenging task that becomes even more challenging in a multilingual setting with low-resource languages. To deal with this, we propose to utilise the Inferential Commonsense Knowledge via a Translate-Retrieve-Translate strategy to introduce a learning framework. To begin with, we use the method of translation and retrieval to acquire the inferential knowledge in the target language. We then employ an attention mechanism to emphasise important inferences. We finally integrate the attended inferences into a multilingual pre-trained language model for the task of bias prediction. To evaluate the effectiveness of our framework, we present a dataset of over 62.6K multilingual news headlines in five European languages annotated with their respective political polarities. We evaluate several state-of-the-art multilingual pre-trained language models since their performance tends to vary across languages (low/high resource). Evaluation results demonstrate that our proposed framework is effective regardless of the models employed. Overall, the best performing model trained with only headlines show 0.90 accuracy and F1, and 0.83 jaccard score. With attended knowledge in our framework, the same model show an increase in 2.2% accuracy and F1, and 3.6% jaccard score. Extending our experiments to individual languages reveals that the models we analyze for Slovenian perform significantly worse than other languages in our dataset. To investigate this, we assess the effect of translation quality on prediction performance. It indicates that the disparity in performance is most likely due to poor translation quality. We release our dataset and scripts at: https://github.com/Swati17293/KG-Multi-Bias for future research. Our framework has the potential to benefit journalists, social scientists, news producers, and consumers.
Paper Structure (30 sections, 10 equations, 4 figures, 7 tables)

This paper contains 30 sections, 10 equations, 4 figures, 7 tables.

Figures (4)

  • Figure 1: News headlines from (a) Czech and (b) Slovenian news outlets on the "hacker attacks on Russia" with varying political polarities. Inferential Commonsense Knowledge (IC_Knwl) can help improve prediction accuracy by facilitating the acquisition of additional bias-cues. (Note: this example shows only a subset of IC_Knwl relations. Image source: https://www.24ur.com/novice/tujina/ukrajina/hekerska-skupina-anonymous-trdi-da-je-vdrla-v-rusko-centralno-banko.html, https://www.novinky.cz/internet-a-pc/bezpecnost/clanek/hackeri-vyhlasili-rusku-valku-vyradili-z-provozu-statni-televizi-rt-40388303, Translation: https://translate.google.com/)
  • Figure 2: Data Collection Framework. We use Media Bias/Fact Check (MBFC) and Event Registry (ER) as the primary data sources in the framework.
  • Figure 3: An overview of our proposed learning framework. To predict political polarity of multilingual news headlines, it combines Inferential Commonsense Knowledge retrieved via the Translate-Retrieve-Translate strategy with multilingual pre-trained language models.
  • Figure 4: A small subset of $IC\_Knwl$ relations generated using $\text{ATOMIC}^{20}_{20}$ as the knowledge base in response to the news headline 'Musk sold Tesla shares for 110 billion'. Nodes in the colours red, green, blue, and orange represent relations depicting social interactions, events, physical entities, and category intersection, respectively.