A Novel BERT-based Classifier to Detect Political Leaning of YouTube Videos based on their Titles
Nouar AlDahoul, Talal Rahwan, Yasir Zaki
TL;DR
The study addresses identifying political leaning in YouTube videos using only their titles. It examines Word2Vec, GloVe, and a fine-tuned BERT classifier on a large dataset of 10.2 million titles labeled into six classes (Far Left, Left, Center, Anti-Woke, Right, Far Right),, finding that fine-tuned BERT achieves the highest accuracy of 75% and F1 of 77%. The approach is validated further on thousands of titles from 15 prominent news agency channels, with predictions largely consistent with AllSides ground-truth leanings. Limitations include channel-based labeling and the potential benefits of incorporating transcripts and other platforms in future work.
Abstract
A quarter of US adults regularly get their news from YouTube. Yet, despite the massive political content available on the platform, to date no classifier has been proposed to identify the political leaning of YouTube videos. To fill this gap, we propose a novel classifier based on Bert -- a language model from Google -- to classify YouTube videos merely based on their titles into six categories, namely: Far Left, Left, Center, Anti-Woke, Right, and Far Right. We used a public dataset of 10 million YouTube video titles (under various categories) to train and validate the proposed classifier. We compare the classifier against several alternatives that we trained on the same dataset, revealing that our classifier achieves the highest accuracy (75%) and the highest F1 score (77%). To further validate the classification performance, we collect videos from YouTube channels of numerous prominent news agencies, such as Fox News and New York Times, which have widely known political leanings, and apply our classifier to their video titles. For the vast majority of cases, the predicted political leaning matches that of the news agency.
