Table of Contents
Fetching ...

A Study on Scaling Up Multilingual News Framing Analysis

Syeda Sabrina Akter, Antonios Anastasopoulos

TL;DR

This work tackles the scalability of multilingual news framing analysis by introducing the crowd-sourced SNFC corpus, extending framing datasets through automatic translation to 12 languages, and creating novel Bengali and Portuguese benchmarks. It demonstrates that integrating SNFC with high-quality expert data (MFC) yields meaningful performance gains in both monolingual and multilingual settings, with MaSNFC-filtered data offering strong benefits in data-scarce scenarios. The study also benchmarks large language models, finding that task-specific fine-tuning (e.g., RoBERTa) substantially outperforms zero-shot generative models, highlighting the continued value of specialized training for framing tasks. Overall, the results advance multilingual framing analysis while outlining limitations related to translation quality and cultural context, guiding future collection of diverse, high-quality multilingual data and model adaptation strategies.

Abstract

Media framing is the study of strategically selecting and presenting specific aspects of political issues to shape public opinion. Despite its relevance to almost all societies around the world, research has been limited due to the lack of available datasets and other resources. This study explores the possibility of dataset creation through crowdsourcing, utilizing non-expert annotators to develop training corpora. We first extend framing analysis beyond English news to a multilingual context (12 typologically diverse languages) through automatic translation. We also present a novel benchmark in Bengali and Portuguese on the immigration and same-sex marriage domains. Additionally, we show that a system trained on our crowd-sourced dataset, combined with other existing ones, leads to a 5.32 percentage point increase from the baseline, showing that crowdsourcing is a viable option. Last, we study the performance of large language models (LLMs) for this task, finding that task-specific fine-tuning is a better approach than employing bigger non-specialized models.

A Study on Scaling Up Multilingual News Framing Analysis

TL;DR

This work tackles the scalability of multilingual news framing analysis by introducing the crowd-sourced SNFC corpus, extending framing datasets through automatic translation to 12 languages, and creating novel Bengali and Portuguese benchmarks. It demonstrates that integrating SNFC with high-quality expert data (MFC) yields meaningful performance gains in both monolingual and multilingual settings, with MaSNFC-filtered data offering strong benefits in data-scarce scenarios. The study also benchmarks large language models, finding that task-specific fine-tuning (e.g., RoBERTa) substantially outperforms zero-shot generative models, highlighting the continued value of specialized training for framing tasks. Overall, the results advance multilingual framing analysis while outlining limitations related to translation quality and cultural context, guiding future collection of diverse, high-quality multilingual data and model adaptation strategies.

Abstract

Media framing is the study of strategically selecting and presenting specific aspects of political issues to shape public opinion. Despite its relevance to almost all societies around the world, research has been limited due to the lack of available datasets and other resources. This study explores the possibility of dataset creation through crowdsourcing, utilizing non-expert annotators to develop training corpora. We first extend framing analysis beyond English news to a multilingual context (12 typologically diverse languages) through automatic translation. We also present a novel benchmark in Bengali and Portuguese on the immigration and same-sex marriage domains. Additionally, we show that a system trained on our crowd-sourced dataset, combined with other existing ones, leads to a 5.32 percentage point increase from the baseline, showing that crowdsourcing is a viable option. Last, we study the performance of large language models (LLMs) for this task, finding that task-specific fine-tuning is a better approach than employing bigger non-specialized models.
Paper Structure (24 sections, 4 figures, 10 tables)

This paper contains 24 sections, 4 figures, 10 tables.

Figures (4)

  • Figure 1: The image illustrates the process of framing in Portuguese at the sentence level, showcasing how specific language for each sentence strategically shape a Political and Equality narrative in the same article.
  • Figure 2: The label distributions of the MFC and our new Bengali and Portuguese test sets. Note that they differ significantly.
  • Figure 3: The best model performs very inequitably across languages on mMFC. The highest accuracy is in English (72.1%) followed by Italian and German, while other languages from non-western countries (e.g. Bengali, Hindi, Chinese, and others) have much lower performance (under 30%).
  • Figure 4: Confusion matrix for the best model's prediction for the mMFC Test set.