Table of Contents
Fetching ...

Classifying populist language in American presidential and governor speeches using automatic text analysis

Olaf van der Veen, Semir Dzebo, Levi Littvay, Kirk Hawkins, Oren Dar

TL;DR

This work tackles the challenge of measuring populist rhetoric by developing a pipeline that fine-tunes a strong sentence-level embedding model (SBERT) with curated, labeled populist, pluralist, and neutral sentences drawn from governor and presidential speeches. The approach yields robust, cross-context classification of populist language at the sentence, speech, and speaker levels, performing well with relatively small training datasets (as few as 70–100 sentences per category). Boundary analyses show context and data sparsity influence performance but do not undermine overall effectiveness, suggesting the method can enable comprehensive, scalable monitoring of populist rhetoric across political systems. The study contributes a practical, efficient tool for social scientists and policymakers to assess populist framing in real-world political discourse and to explore its variability over time and context.

Abstract

Populism is a concept that is often used but notoriously difficult to measure. Common qualitative measurements like holistic grading or content analysis require great amounts of time and labour, making it difficult to quickly scope out which politicians should be classified as populist and which should not, while quantitative methods show mixed results when it comes to classifying populist rhetoric. In this paper, we develop a pipeline to train and validate an automated classification model to estimate the use of populist language. We train models based on sentences that were identified as populist and pluralist in 300 US governors' speeches from 2010 to 2018 and in 45 speeches of presidential candidates in 2016. We find that these models classify most speeches correctly, including 84% of governor speeches and 89% of presidential speeches. These results extend to different time periods (with 92% accuracy on more recent American governors), different amounts of data (with as few as 70 training sentences per category achieving similar results), and when classifying politicians instead of individual speeches. This pipeline is thus an effective tool that can optimise the systematic and swift classification of the use of populist language in politicians' speeches.

Classifying populist language in American presidential and governor speeches using automatic text analysis

TL;DR

This work tackles the challenge of measuring populist rhetoric by developing a pipeline that fine-tunes a strong sentence-level embedding model (SBERT) with curated, labeled populist, pluralist, and neutral sentences drawn from governor and presidential speeches. The approach yields robust, cross-context classification of populist language at the sentence, speech, and speaker levels, performing well with relatively small training datasets (as few as 70–100 sentences per category). Boundary analyses show context and data sparsity influence performance but do not undermine overall effectiveness, suggesting the method can enable comprehensive, scalable monitoring of populist rhetoric across political systems. The study contributes a practical, efficient tool for social scientists and policymakers to assess populist framing in real-world political discourse and to explore its variability over time and context.

Abstract

Populism is a concept that is often used but notoriously difficult to measure. Common qualitative measurements like holistic grading or content analysis require great amounts of time and labour, making it difficult to quickly scope out which politicians should be classified as populist and which should not, while quantitative methods show mixed results when it comes to classifying populist rhetoric. In this paper, we develop a pipeline to train and validate an automated classification model to estimate the use of populist language. We train models based on sentences that were identified as populist and pluralist in 300 US governors' speeches from 2010 to 2018 and in 45 speeches of presidential candidates in 2016. We find that these models classify most speeches correctly, including 84% of governor speeches and 89% of presidential speeches. These results extend to different time periods (with 92% accuracy on more recent American governors), different amounts of data (with as few as 70 training sentences per category achieving similar results), and when classifying politicians instead of individual speeches. This pipeline is thus an effective tool that can optimise the systematic and swift classification of the use of populist language in politicians' speeches.
Paper Structure (24 sections, 8 figures, 5 tables)

This paper contains 24 sections, 8 figures, 5 tables.

Figures (8)

  • Figure 1: Confusion matrices for presidential candidates, speeches (left) and speakers (right)
  • Figure 2: Confusion matrices for governors with 2010-2018 terms, speeches (left) and speakers (right)
  • Figure 3: Confusion matrices for governors with 2018-2022 terms, speeches (left) and speakers (right)
  • Figure 4: Model performance by number of sentences per class
  • Figure 5: Scatter plots for presidential candidates, speeches (left) and speakers (right).
  • ...and 3 more figures