Visual Objectification in Films: Towards a New AI Task for Video Interpretation
Julie Tores, Lucile Sassatelli, Hui-Yin Wu, Clement Bergman, Lea Andolfi, Victor Ecrement, Frederic Precioso, Thierry Devars, Magali Guaresi, Virginie Julliard, Sarah Lecossais
TL;DR
This work defines an interpretive video task to detect character objectification in films and introduces ObyGaze12, a densely annotated dataset with 1914 clips from 12 films and a thesaurus of eight objectification concepts assembled from film studies and psychology. It benchmarks pre-trained vision-language models (e.g., X-CLIP) and concept-based approaches to assess feasibility, while employing post-hoc concept bottleneck models to analyze concept representation and interpretability. Findings indicate the task is feasible but challenging, with hard negatives improving classification, and several concepts (notably Type of shot, Look, Posture, Appearance) remaining difficult to represent; generalization across unseen movies remains a key challenge. The dataset, code, and analyses provide a foundation for explainable, high-level analysis of gender representation in cinema and invite further improvements in concept representations and temporal modeling for video interpretation.
Abstract
In film gender studies, the concept of 'male gaze' refers to the way the characters are portrayed on-screen as objects of desire rather than subjects. In this article, we introduce a novel video-interpretation task, to detect character objectification in films. The purpose is to reveal and quantify the usage of complex temporal patterns operated in cinema to produce the cognitive perception of objectification. We introduce the ObyGaze12 dataset, made of 1914 movie clips densely annotated by experts for objectification concepts identified in film studies and psychology. We evaluate recent vision models, show the feasibility of the task and where the challenges remain with concept bottleneck models. Our new dataset and code are made available to the community.
