Table of Contents
Fetching ...

Detecting Cultural Differences in News Video Thumbnails via Computational Aesthetics

Marvin Limpijankit, John Kender

TL;DR

This paper tackles cross-cultural differences in news thumbnail aesthetics by proposing a two-step framework that first clusters thumbnails into visual themes and then compares 21 handcrafted visual features across US and Chinese sources for COVID-19 and Ukraine. The method leverages CLIP and BERTopic for theme discovery and Tag2Text for thematic tagging, paired with a rich feature set including color, geometry, texture, and object content derived from CNNs and YOLOv5. Key findings show US thumbnails tend to be more professional, with closer portraits, darker tones, and lower color saturation, while Chinese thumbnails are brighter, more colorful, and often depict longer-shot scenes, reflecting distinct cultural preferences and framing strategies. The work provides a baseline for interpreting visual propaganda and disinformation by offering a content-controlled, cross-cultural aesthetic analysis that can be extended to larger, multi-domain datasets and to entertainment media.

Abstract

We propose a two-step approach for detecting differences in the style of images across sources of differing cultural affinity, where images are first clustered into finer visual themes based on content before their aesthetic features are compared. We test this approach on 2,400 YouTube video thumbnails taken equally from two U.S. and two Chinese YouTube channels, and relating equally to COVID-19 and the Ukraine conflict. Our results suggest that while Chinese thumbnails are less formal and more candid, U.S. channels tend to use more deliberate, proper photographs as thumbnails. In particular, U.S. thumbnails are less colorful, more saturated, darker, more finely detailed, less symmetric, sparser, less varied, and more up close and personal than Chinese thumbnails. We suggest that most of these differences reflect cultural preferences, and that our methods and observations can serve as a baseline against which suspected visual propaganda can be computed and compared.

Detecting Cultural Differences in News Video Thumbnails via Computational Aesthetics

TL;DR

This paper tackles cross-cultural differences in news thumbnail aesthetics by proposing a two-step framework that first clusters thumbnails into visual themes and then compares 21 handcrafted visual features across US and Chinese sources for COVID-19 and Ukraine. The method leverages CLIP and BERTopic for theme discovery and Tag2Text for thematic tagging, paired with a rich feature set including color, geometry, texture, and object content derived from CNNs and YOLOv5. Key findings show US thumbnails tend to be more professional, with closer portraits, darker tones, and lower color saturation, while Chinese thumbnails are brighter, more colorful, and often depict longer-shot scenes, reflecting distinct cultural preferences and framing strategies. The work provides a baseline for interpreting visual propaganda and disinformation by offering a content-controlled, cross-cultural aesthetic analysis that can be extended to larger, multi-domain datasets and to entertainment media.

Abstract

We propose a two-step approach for detecting differences in the style of images across sources of differing cultural affinity, where images are first clustered into finer visual themes based on content before their aesthetic features are compared. We test this approach on 2,400 YouTube video thumbnails taken equally from two U.S. and two Chinese YouTube channels, and relating equally to COVID-19 and the Ukraine conflict. Our results suggest that while Chinese thumbnails are less formal and more candid, U.S. channels tend to use more deliberate, proper photographs as thumbnails. In particular, U.S. thumbnails are less colorful, more saturated, darker, more finely detailed, less symmetric, sparser, less varied, and more up close and personal than Chinese thumbnails. We suggest that most of these differences reflect cultural preferences, and that our methods and observations can serve as a baseline against which suspected visual propaganda can be computed and compared.

Paper Structure

This paper contains 46 sections, 1 equation, 10 figures, 2 tables.

Figures (10)

  • Figure 1: An example of Photo-Space visualization bhargava2020mapping, where physical placement captures thematic relations, and image borders are color-coded by political affinity.
  • Figure 2: Example visual themes from the Ukraine war.
  • Figure 3: Ukraine and COVID themes, China vs. U.S.
  • Figure 4: Examples of top- and bottom-ranked thumbnails for three of the CNN-based features.
  • Figure 5: Like-rate distribution across events and channels.
  • ...and 5 more figures