Table of Contents
Fetching ...

Semantic Scaling: Bayesian Ideal Point Estimates with Large Language Models

Michael Burnham

TL;DR

Semantic Scaling leverages zero-shot entailment labeling by large language models to extract survey-like stance data from text and then applies Bayesian item response theory to estimate ideology along researcher-defined dimensions. By distinguishing affective and policy dimensions and accommodating documents of varying length, it delivers interpretable, cross-context comparable ideal points that align with established measures like DW-NOMINATE while offering greater flexibility. The two political applications—Twitter and the 117th Congress—demonstrate validity, including recapturing known distributions, aligning with human judgments, and exposing nuanced group dynamics such as in-group/out-group affect. This approach enables ideology research in contexts where traditional survey data are hard to obtain, and it invites further development of domain-adapted models and software to broaden adoption.

Abstract

This paper introduces "Semantic Scaling," a novel method for ideal point estimation from text. I leverage large language models to classify documents based on their expressed stances and extract survey-like data. I then use item response theory to scale subjects from these data. Semantic Scaling significantly improves on existing text-based scaling methods, and allows researchers to explicitly define the ideological dimensions they measure. This represents the first scaling approach that allows such flexibility outside of survey instruments and opens new avenues of inquiry for populations difficult to survey. Additionally, it works with documents of varying length, and produces valid estimates of both mass and elite ideology. I demonstrate that the method can differentiate between policy preferences and in-group/out-group affect. Among the public, Semantic Scaling out-preforms Tweetscores according to human judgement; in Congress, it recaptures the first dimension DW-NOMINATE while allowing for greater flexibility in resolving construct validity challenges.

Semantic Scaling: Bayesian Ideal Point Estimates with Large Language Models

TL;DR

Semantic Scaling leverages zero-shot entailment labeling by large language models to extract survey-like stance data from text and then applies Bayesian item response theory to estimate ideology along researcher-defined dimensions. By distinguishing affective and policy dimensions and accommodating documents of varying length, it delivers interpretable, cross-context comparable ideal points that align with established measures like DW-NOMINATE while offering greater flexibility. The two political applications—Twitter and the 117th Congress—demonstrate validity, including recapturing known distributions, aligning with human judgments, and exposing nuanced group dynamics such as in-group/out-group affect. This approach enables ideology research in contexts where traditional survey data are hard to obtain, and it invites further development of domain-adapted models and software to broaden adoption.

Abstract

This paper introduces "Semantic Scaling," a novel method for ideal point estimation from text. I leverage large language models to classify documents based on their expressed stances and extract survey-like data. I then use item response theory to scale subjects from these data. Semantic Scaling significantly improves on existing text-based scaling methods, and allows researchers to explicitly define the ideological dimensions they measure. This represents the first scaling approach that allows such flexibility outside of survey instruments and opens new avenues of inquiry for populations difficult to survey. Additionally, it works with documents of varying length, and produces valid estimates of both mass and elite ideology. I demonstrate that the method can differentiate between policy preferences and in-group/out-group affect. Among the public, Semantic Scaling out-preforms Tweetscores according to human judgement; in Congress, it recaptures the first dimension DW-NOMINATE while allowing for greater flexibility in resolving construct validity challenges.
Paper Structure (15 sections, 5 equations, 6 figures, 1 table)

This paper contains 15 sections, 5 equations, 6 figures, 1 table.

Figures (6)

  • Figure 1: Semantic Scaling recaptures the bimodal distribution and is highly correlated with Tweetscores.
  • Figure 2: Scatter plot of the 100 observations with the highest residuals when regressing Tweetscores on Semantic Scaling. Axes are drawn at the local minimum between the modes of the distributions to roughly divide liberals and conservatives. Human labels derived by reading a random sample of tweets from each user largely agree with Semantic Scaling over Tweetscores.
  • Figure 3: The correlations and standard errors between Semantic Scaling and DW-NOMINATE demonstrate that Semantic Scaling is able to recapture the distribution of DW-NOMINATE.
  • Figure 4: Both DW-NOMINATE and Semantic Scaling predict Mitt Romney, Susan Collins, Kyrsten Sinema, and Joe Manchin to be among the most moderate members of the Senate.
  • Figure 5: Both DW-NOMINATE and the Semantic Scale with all items show similar results for House subfactions. The MAGA Squad is clearly conservative while the Squad is perhaps more moderate than would be expected.
  • ...and 1 more figures