Television Discourse Decoded: Comprehensive Multimodal Analytics at Scale
Anmol Agarwal, Pratyush Priyadarshi, Shiven Sinha, Shrey Gupta, Hitkul Jangra, Ponnurangam Kumaraguru, Kiran Garimella
TL;DR
The paper tackles the challenge of analyzing televised debates at scale by introducing a comprehensive multimodal analytics toolkit that fuses computer vision, speech-to-text, and NLP to transcribe, diarize, and analyze thousands of YouTube debates from a major Indian prime-time show. It builds a large-scale dataset (2,087 hours across 3,000 videos) and deploys a hybrid annotation pipeline (including LLM-assisted labeling) to quantify bias, gender representation, and incivility through metrics such as topic bias toward the ruling party, underrepresentation of women, overlapping speech, toxicity, and shouting. Key findings reveal a pro-ruling-party bias, persistent gender imbalance, and elevated incivility, with shouting averaging about 9% of debate duration and toxicity concentrated on sensitive topics; the work also demonstrates generalizability to other English-language debates. The study contributes a scalable methodology and openly shares code and data to catalyze further research in multimedia discourse analysis, with implications for media ethics, democratic deliberation, and policy.
Abstract
In this paper, we tackle the complex task of analyzing televised debates, with a focus on a prime time news debate show from India. Previous methods, which often relied solely on text, fall short in capturing the multimodal essence of these debates. To address this gap, we introduce a comprehensive automated toolkit that employs advanced computer vision and speech-to-text techniques for large-scale multimedia analysis. Utilizing state-of-the-art computer vision algorithms and speech-to-text methods, we transcribe, diarize, and analyze thousands of YouTube videos of a prime-time television debate show in India. These debates are a central part of Indian media but have been criticized for compromised journalistic integrity and excessive dramatization. Our toolkit provides concrete metrics to assess bias and incivility, capturing a comprehensive multimedia perspective that includes text, audio utterances, and video frames. Our findings reveal significant biases in topic selection and panelist representation, along with alarming levels of incivility. This work offers a scalable, automated approach for future research in multimedia analysis, with profound implications for the quality of public discourse and democratic debate. To catalyze further research in this area, we also release the code, dataset collected and supplemental pdf.
