Beyond the Link: Assessing LLMs' ability to Classify Political Content across Global Media
Alejandro De La Fuente-Cuesta, Alberto Martinez-Serra, Nienke Visscher, Laia Castro, Ana S. Cardenal
TL;DR
The paper investigates whether large language models can reliably classify political content from URLs, across five countries and multiple languages, and compares URL-only signals to full-text analysis against human-annotated ground truth. It evaluates a diverse set of LLMs under zero-shot prompts and includes an abstention option, showing that URL-based classification often achieves high balanced accuracy and precision, making it a scalable alternative when full text is unavailable. However, the study uncovers a systematic bias: LLMs tend to overclassify centrist news as political, which can distort estimates of exposure and polarisation if left unchecked. The authors provide methodological recommendations, including the use of abstention, cross-validation with human coding, and careful model-choice tailored to research aims. Overall, the work advances political communication methods by enabling rapid, cross-language PC detection from URL metadata with practical implications for large-scale media analysis.
Abstract
The use of large language models (LLMs) is becoming common in political science and digital media research. While LLMs have demonstrated ability in labelling tasks, their effectiveness to classify Political Content (PC) from URLs remains underexplored. This article evaluates whether LLMs can accurately distinguish PC from non-PC using both the text and the URLs of news articles across five countries (France, Germany, Spain, the UK, and the US) and their different languages. Using cutting-edge models, we benchmark their performance against human-coded data to assess whether URL-level analysis can approximate full-text analysis. Our findings show that URLs embed relevant information and can serve as a scalable, cost-effective alternative to discern PC. However, we also uncover systematic biases: LLMs seem to overclassify centrist news as political, leading to false positives that may distort further analyses. We conclude by outlining methodological recommendations on the use of LLMs in political science research.
