PopBERT. Detecting populism and its host ideologies in the German Bundestag
L. Erhard, S. Hanke, U. Remer, A. Falenska, R. Heiberger
TL;DR
PopBERT introduces a transformer-based framework to detect populist language and its host ideologies in German Bundestag debates (2013–2021). It builds an 8,795-sentence, expert-annotated dataset labeled for anti-elitism, people-centrism, and left/right host ideologies, enabling a multilabel GBERT Large classifier. The model achieves strong predictive performance across dimensions, aligns with expert surveys (CHES) at aggregation levels, and demonstrates plausible out-of-sample detectability of prototypical populist statements. The work provides a scalable tool for dynamic analysis of populist rhetoric, offers rich annotator data for cross-domain use, and lays groundwork for future cross-linguistic and cross-domain research in political discourse analysis.
Abstract
The rise of populism concerns many political scientists and practitioners, yet the detection of its underlying language remains fragmentary. This paper aims to provide a reliable, valid, and scalable approach to measure populist stances. For that purpose, we created an annotated dataset based on parliamentary speeches of the German Bundestag (2013 to 2021). Following the ideational definition of populism, we label moralizing references to the virtuous people or the corrupt elite as core dimensions of populist language. To identify, in addition, how the thin ideology of populism is thickened, we annotate how populist statements are attached to left-wing or right-wing host ideologies. We then train a transformer-based model (PopBERT) as a multilabel classifier to detect and quantify each dimension. A battery of validation checks reveals that the model has a strong predictive accuracy, provides high qualitative face validity, matches party rankings of expert surveys, and detects out-of-sample text snippets correctly. PopBERT enables dynamic analyses of how German-speaking politicians and parties use populist language as a strategic device. Furthermore, the annotator-level data may also be applied in cross-domain applications or to develop related classifiers.
