Gender Representation in TV and Radio: Automatic Information Extraction methods versus Manual Analyses
David Doukhan, Lena Dodson, Manon Conan, Valentin Pelloin, Aurélien Clamouse, Mélina Lepape, Géraldine Van Hille, Cécile Méadel, Marlène Coulomb-Gully
TL;DR
The paper tackles the problem of quantifying gender representation in TV and radio by comparing automatic information-extraction descriptors (speech time, facial exposure, and quotations) with manual ARCOM channel reports on a 32k-hour French broadcast corpus from 2023. It employs a multi-modal pipeline including inaSpeechSegmenter for speaker gender, WhisperX-aligned transcripts, INSEE-based first-name gender attribution, and inaFaceAnalyzer for face detection to derive WPR, WRE, WSR, WFR, and WQR metrics. Across all descriptors, women are underrepresented, and manual channel reports generally yield higher presence estimates than automatic methods, with ads showing notably higher female representation and news exhibiting distinctive dynamics in TV. The study identifies program category, channel, and speaker gender as significant factors (e.g., $ ext{\eta}^2$ around 0.01–0.034) influencing references to women, and discusses limitations such as reliance on binary gender categorization and first-name proxies, suggesting avenues for more nuanced, non-binary analyses and methodological refinements. The findings have implications for media accountability, policy evaluation, and the development of more robust, bias-aware automatic monitoring tools.
Abstract
This study investigates the relationship between automatic information extraction descriptors and manual analyses to describe gender representation disparities in TV and Radio. Automatic descriptors, including speech time, facial categorization and speech transcriptions are compared with channel reports on a vast 32,000-hour corpus of French broadcasts from 2023. Findings reveal systemic gender imbalances, with women underrepresented compared to men across all descriptors. Notably, manual channel reports show higher women's presence than automatic estimates and references to women are lower than their speech time. Descriptors share common dynamics during high and low audiences, war coverage, or private versus public channels. While women are more visible than audible in French TV, this trend is inverted in news with unseen journalists depicting male protagonists. A statistical test shows 3 main effects influencing references to women: program category, channel and speaker gender.
