A Study on Scaling Up Multilingual News Framing Analysis

Syeda Sabrina Akter; Antonios Anastasopoulos

A Study on Scaling Up Multilingual News Framing Analysis

Syeda Sabrina Akter, Antonios Anastasopoulos

TL;DR

This work tackles the scalability of multilingual news framing analysis by introducing the crowd-sourced SNFC corpus, extending framing datasets through automatic translation to 12 languages, and creating novel Bengali and Portuguese benchmarks. It demonstrates that integrating SNFC with high-quality expert data (MFC) yields meaningful performance gains in both monolingual and multilingual settings, with MaSNFC-filtered data offering strong benefits in data-scarce scenarios. The study also benchmarks large language models, finding that task-specific fine-tuning (e.g., RoBERTa) substantially outperforms zero-shot generative models, highlighting the continued value of specialized training for framing tasks. Overall, the results advance multilingual framing analysis while outlining limitations related to translation quality and cultural context, guiding future collection of diverse, high-quality multilingual data and model adaptation strategies.

Abstract

Media framing is the study of strategically selecting and presenting specific aspects of political issues to shape public opinion. Despite its relevance to almost all societies around the world, research has been limited due to the lack of available datasets and other resources. This study explores the possibility of dataset creation through crowdsourcing, utilizing non-expert annotators to develop training corpora. We first extend framing analysis beyond English news to a multilingual context (12 typologically diverse languages) through automatic translation. We also present a novel benchmark in Bengali and Portuguese on the immigration and same-sex marriage domains. Additionally, we show that a system trained on our crowd-sourced dataset, combined with other existing ones, leads to a 5.32 percentage point increase from the baseline, showing that crowdsourcing is a viable option. Last, we study the performance of large language models (LLMs) for this task, finding that task-specific fine-tuning is a better approach than employing bigger non-specialized models.

A Study on Scaling Up Multilingual News Framing Analysis

TL;DR

Abstract

Paper Structure (24 sections, 4 figures, 10 tables)

This paper contains 24 sections, 4 figures, 10 tables.

Introduction
Related Work
Dataset Creation
SNFC Training Corpus
Multilinguality
Novel Test Set
Framing Analysis System and Results
Experimental Setup
English Results and Discussion
Filtering of Crowdsourced Data
Multilingual Results and Discussion
mMFC Breakdown per Language
Error Analysis
Generative Language Models
Experimental Setting
...and 9 more sections

Figures (4)

Figure 1: The image illustrates the process of framing in Portuguese at the sentence level, showcasing how specific language for each sentence strategically shape a Political and Equality narrative in the same article.
Figure 2: The label distributions of the MFC and our new Bengali and Portuguese test sets. Note that they differ significantly.
Figure 3: The best model performs very inequitably across languages on mMFC. The highest accuracy is in English (72.1%) followed by Italian and German, while other languages from non-western countries (e.g. Bengali, Hindi, Chinese, and others) have much lower performance (under 30%).
Figure 4: Confusion matrix for the best model's prediction for the mMFC Test set.

A Study on Scaling Up Multilingual News Framing Analysis

TL;DR

Abstract

A Study on Scaling Up Multilingual News Framing Analysis

Authors

TL;DR

Abstract

Table of Contents

Figures (4)