Table of Contents
Fetching ...

A Novel BERT-based Classifier to Detect Political Leaning of YouTube Videos based on their Titles

Nouar AlDahoul, Talal Rahwan, Yasir Zaki

TL;DR

The study addresses identifying political leaning in YouTube videos using only their titles. It examines Word2Vec, GloVe, and a fine-tuned BERT classifier on a large dataset of 10.2 million titles labeled into six classes (Far Left, Left, Center, Anti-Woke, Right, Far Right),, finding that fine-tuned BERT achieves the highest accuracy of 75% and F1 of 77%. The approach is validated further on thousands of titles from 15 prominent news agency channels, with predictions largely consistent with AllSides ground-truth leanings. Limitations include channel-based labeling and the potential benefits of incorporating transcripts and other platforms in future work.

Abstract

A quarter of US adults regularly get their news from YouTube. Yet, despite the massive political content available on the platform, to date no classifier has been proposed to identify the political leaning of YouTube videos. To fill this gap, we propose a novel classifier based on Bert -- a language model from Google -- to classify YouTube videos merely based on their titles into six categories, namely: Far Left, Left, Center, Anti-Woke, Right, and Far Right. We used a public dataset of 10 million YouTube video titles (under various categories) to train and validate the proposed classifier. We compare the classifier against several alternatives that we trained on the same dataset, revealing that our classifier achieves the highest accuracy (75%) and the highest F1 score (77%). To further validate the classification performance, we collect videos from YouTube channels of numerous prominent news agencies, such as Fox News and New York Times, which have widely known political leanings, and apply our classifier to their video titles. For the vast majority of cases, the predicted political leaning matches that of the news agency.

A Novel BERT-based Classifier to Detect Political Leaning of YouTube Videos based on their Titles

TL;DR

The study addresses identifying political leaning in YouTube videos using only their titles. It examines Word2Vec, GloVe, and a fine-tuned BERT classifier on a large dataset of 10.2 million titles labeled into six classes (Far Left, Left, Center, Anti-Woke, Right, Far Right),, finding that fine-tuned BERT achieves the highest accuracy of 75% and F1 of 77%. The approach is validated further on thousands of titles from 15 prominent news agency channels, with predictions largely consistent with AllSides ground-truth leanings. Limitations include channel-based labeling and the potential benefits of incorporating transcripts and other platforms in future work.

Abstract

A quarter of US adults regularly get their news from YouTube. Yet, despite the massive political content available on the platform, to date no classifier has been proposed to identify the political leaning of YouTube videos. To fill this gap, we propose a novel classifier based on Bert -- a language model from Google -- to classify YouTube videos merely based on their titles into six categories, namely: Far Left, Left, Center, Anti-Woke, Right, and Far Right. We used a public dataset of 10 million YouTube video titles (under various categories) to train and validate the proposed classifier. We compare the classifier against several alternatives that we trained on the same dataset, revealing that our classifier achieves the highest accuracy (75%) and the highest F1 score (77%). To further validate the classification performance, we collect videos from YouTube channels of numerous prominent news agencies, such as Fox News and New York Times, which have widely known political leanings, and apply our classifier to their video titles. For the vast majority of cases, the predicted political leaning matches that of the news agency.
Paper Structure (1 section, 4 figures, 9 tables)

This paper contains 1 section, 4 figures, 9 tables.

Table of Contents

  1. Introduction

Figures (4)

  • Figure 1: Experimental setup. An illustration of the different stages undertaken during our experiment. In Stage 1, the labelled video titles are prepared and cleaned. In Stage 2, the classification model is designed, trained, and validated utilizing the video title dataset. In Stage 3, the model is tested using a separated set of video titles to evaluate its performance. Finally, in Stage 4, video titles collected from 15 YouTube channels are used for model validation.
  • Figure 2: Distribution of categories in our dataset. The left plot depicts the distribution of category in the entire dataset, while the right plot depicts the distribution in the testing dataset.
  • Figure 3: Confusion Matrix of predictions made by our YouTube Political Leaning Classifier.
  • Figure 4: Distribution of political leaning predictions of videos in 15 YouTube channels. The left, center, and right columns correspond to channels whose ground truth political leaning is Left, Center, and Right, respectively.