Table of Contents
Fetching ...

Multimodal Analysis of State-Funded News Coverage of the Israel-Hamas War on YouTube Shorts

Daniel Miehling, Sandra Kuebler

Abstract

YouTube Shorts have become central to news consumption on the platform, yet research on how geopolitical events are represented in this format remains limited. To address this gap, we present a multimodal pipeline that combines automatic transcription, aspect-based sentiment analysis (ABSA), and semantic scene classification. The pipeline is first assessed for feasibility and then applied to analyze short-form coverage of the Israel-Hamas war by state-funded outlets. Using over 2,300 conflict-related Shorts and more than 94,000 visual frames, we systematically examine war reporting across major international broadcasters. Our findings reveal that the sentiment expressed in transcripts regarding specific aspects differs across outlets and over time, whereas scene-type classifications reflect visual cues consistent with real-world events. Notably, smaller domain-adapted models outperform large transformers and even LLMs for sentiment analysis, underscoring the value of resource-efficient approaches for humanities research. The pipeline serves as a template for other short-form platforms, such as TikTok and Instagram, and demonstrates how multimodal methods, combined with qualitative interpretation, can characterize sentiment patterns and visual cues in algorithmically driven video environments.

Multimodal Analysis of State-Funded News Coverage of the Israel-Hamas War on YouTube Shorts

Abstract

YouTube Shorts have become central to news consumption on the platform, yet research on how geopolitical events are represented in this format remains limited. To address this gap, we present a multimodal pipeline that combines automatic transcription, aspect-based sentiment analysis (ABSA), and semantic scene classification. The pipeline is first assessed for feasibility and then applied to analyze short-form coverage of the Israel-Hamas war by state-funded outlets. Using over 2,300 conflict-related Shorts and more than 94,000 visual frames, we systematically examine war reporting across major international broadcasters. Our findings reveal that the sentiment expressed in transcripts regarding specific aspects differs across outlets and over time, whereas scene-type classifications reflect visual cues consistent with real-world events. Notably, smaller domain-adapted models outperform large transformers and even LLMs for sentiment analysis, underscoring the value of resource-efficient approaches for humanities research. The pipeline serves as a template for other short-form platforms, such as TikTok and Instagram, and demonstrates how multimodal methods, combined with qualitative interpretation, can characterize sentiment patterns and visual cues in algorithmically driven video environments.

Paper Structure

This paper contains 31 sections, 7 figures, 8 tables.

Figures (7)

  • Figure 1: Multimodal Pipeline for Shorts Analysis.
  • Figure 2: Dependency analysis for aspects 'Netanyahu' and 'Israeli'. Predicted Sentiment: Neg. (0.99).
  • Figure 3: True-positive examples from scene classification (faces of non-publicly known individuals blurred).
  • Figure 4: Examples of challenging cases in semantic scene classification.
  • Figure 5: Sentiment comparison per outlet (log-transformed view counts).
  • ...and 2 more figures