Visual Objectification in Films: Towards a New AI Task for Video Interpretation

Julie Tores; Lucile Sassatelli; Hui-Yin Wu; Clement Bergman; Lea Andolfi; Victor Ecrement; Frederic Precioso; Thierry Devars; Magali Guaresi; Virginie Julliard; Sarah Lecossais

Visual Objectification in Films: Towards a New AI Task for Video Interpretation

Julie Tores, Lucile Sassatelli, Hui-Yin Wu, Clement Bergman, Lea Andolfi, Victor Ecrement, Frederic Precioso, Thierry Devars, Magali Guaresi, Virginie Julliard, Sarah Lecossais

TL;DR

This work defines an interpretive video task to detect character objectification in films and introduces ObyGaze12, a densely annotated dataset with 1914 clips from 12 films and a thesaurus of eight objectification concepts assembled from film studies and psychology. It benchmarks pre-trained vision-language models (e.g., X-CLIP) and concept-based approaches to assess feasibility, while employing post-hoc concept bottleneck models to analyze concept representation and interpretability. Findings indicate the task is feasible but challenging, with hard negatives improving classification, and several concepts (notably Type of shot, Look, Posture, Appearance) remaining difficult to represent; generalization across unseen movies remains a key challenge. The dataset, code, and analyses provide a foundation for explainable, high-level analysis of gender representation in cinema and invite further improvements in concept representations and temporal modeling for video interpretation.

Abstract

In film gender studies, the concept of 'male gaze' refers to the way the characters are portrayed on-screen as objects of desire rather than subjects. In this article, we introduce a novel video-interpretation task, to detect character objectification in films. The purpose is to reveal and quantify the usage of complex temporal patterns operated in cinema to produce the cognitive perception of objectification. We introduce the ObyGaze12 dataset, made of 1914 movie clips densely annotated by experts for objectification concepts identified in film studies and psychology. We evaluate recent vision models, show the feasibility of the task and where the challenges remain with concept bottleneck models. Our new dataset and code are made available to the community.

Visual Objectification in Films: Towards a New AI Task for Video Interpretation

TL;DR

Abstract

Paper Structure (36 sections, 5 equations, 6 figures, 5 tables)

This paper contains 36 sections, 5 equations, 6 figures, 5 tables.

Introduction
Related works
Visual biases in film datasets
Interpretive-level tasks and dataset creation
Approaches to video and movie understanding
Pre-trained models for video understanding
Movie-related tasks
Concept-based models
Data and methods
A thesaurus of objectification
Data selection
Data annotation
Data processing and fusion
Analysis of the ObyGaze12 dataset
Experiments
...and 21 more sections

Figures (6)

Figure 1: In modern film media, the unequal characterization of gender on screen frequently evokes concepts of objectification, such as (A) unequal gaze (Pulp Fiction, 1994), (B) Nudity and submissive postures (Pulp Fiction, 1994), (C) animalisation or infantilisation (Marley and Me, 2008), and (D) transparent clothing, camera framing, domestic gender roles, and voyeurism (Gone Girl, 2014).
Figure 2: Distribution of visual factors annotated for each level of objectification (HN = Hard negative, NS = Not sure, S = Sure). The percentage of the dataset for each level of objectification as well as the average number of concepts per clip are also shown. (Best viewed in colors)
Figure 3: For every concept, F1-score of the best linear SVM selected to define the CAV of this concept. Positive samples (S and HN with the concept) must be separated from: [non-hatched bars] negative samples made of EN only, or [hatched bars] negative samples made of EN and S and HN without the concept.
Figure 4: The annotation and data processing procedure is as follows. (1) Two experts annotate each film, with free delimitation ( Annotation 1 and Annotation 2). (2) Annotations are projected onto the MovieGraphs delimitation (dashed gray line), taking the highest level of objectification while enforcing a minimum overlap threshold of 20% ( Projection 1 and Projection 2). (3) Projections are Merged, taking the highest level of objectification and merging the concepts only for the same level of objectification.
Figure 5: Decision tree trained for the objectification detection task of Easy Negative vs. Sure, fed with embedding similarities to CAV obtained from contrasting clips with concept against Easy Negative examples. Orange (resp. blue) shaded boxes represent a majority of negative (resp. positive) clip examples (i.e., without or with objectification).
...and 1 more figures

Visual Objectification in Films: Towards a New AI Task for Video Interpretation

TL;DR

Abstract

Visual Objectification in Films: Towards a New AI Task for Video Interpretation

Authors

TL;DR

Abstract

Table of Contents

Figures (6)