Table of Contents
Fetching ...

Arti-"fickle" Intelligence: Using LLMs as a Tool for Inference in the Political and Social Sciences

Lisa P. Argyle, Ethan C. Busby, Joshua R. Gubler, Bryce Hepner, Alex Lyman, David Wingate

TL;DR

The paper addresses how social scientists should use large language models to advance inference rather than simply demonstrate capabilities. It argues for explicit targets of inference and systematic validation, including transparent reporting of both successes and failures, to enable cumulative knowledge. A pragmatic, two-pronged validation framework—pre-registration of validation or pre-registration of validation standards—is proposed to adapt to rapidly changing models while preserving transparency. The discussion also calls for attention to governance, ethics, and reproducibility, arguing that humility and cross-disciplinary collaboration are essential to unlock LLMs' scientific value.

Abstract

Generative large language models (LLMs) are incredibly useful, versatile, and promising tools. However, they will be of most use to political and social science researchers when they are used in a way that advances understanding about real human behaviors and concerns. To promote the scientific use of LLMs, we suggest that researchers in the political and social sciences need to remain focused on the scientific goal of inference. To this end, we discuss the challenges and opportunities related to scientific inference with LLMs, using validation of model output as an illustrative case for discussion. We propose a set of guidelines related to establishing the failure and success of LLMs when completing particular tasks, and discuss how we can make inferences from these observations. We conclude with a discussion of how this refocus will improve the accumulation of shared scientific knowledge about these tools and their uses in the social sciences.

Arti-"fickle" Intelligence: Using LLMs as a Tool for Inference in the Political and Social Sciences

TL;DR

The paper addresses how social scientists should use large language models to advance inference rather than simply demonstrate capabilities. It argues for explicit targets of inference and systematic validation, including transparent reporting of both successes and failures, to enable cumulative knowledge. A pragmatic, two-pronged validation framework—pre-registration of validation or pre-registration of validation standards—is proposed to adapt to rapidly changing models while preserving transparency. The discussion also calls for attention to governance, ethics, and reproducibility, arguing that humility and cross-disciplinary collaboration are essential to unlock LLMs' scientific value.

Abstract

Generative large language models (LLMs) are incredibly useful, versatile, and promising tools. However, they will be of most use to political and social science researchers when they are used in a way that advances understanding about real human behaviors and concerns. To promote the scientific use of LLMs, we suggest that researchers in the political and social sciences need to remain focused on the scientific goal of inference. To this end, we discuss the challenges and opportunities related to scientific inference with LLMs, using validation of model output as an illustrative case for discussion. We propose a set of guidelines related to establishing the failure and success of LLMs when completing particular tasks, and discuss how we can make inferences from these observations. We conclude with a discussion of how this refocus will improve the accumulation of shared scientific knowledge about these tools and their uses in the social sciences.

Paper Structure

This paper contains 11 sections.