Towards Human-Level Text Coding with LLMs: The Case of Fatherhood Roles in Public Policy Documents

Lorenzo Lupo; Oscar Magnusson; Dirk Hovy; Elin Naurin; Lena Wängnerud

Towards Human-Level Text Coding with LLMs: The Case of Fatherhood Roles in Public Policy Documents

Lorenzo Lupo, Oscar Magnusson, Dirk Hovy, Elin Naurin, Lena Wängnerud

TL;DR

This paper investigates automated text coding in political science using large language models by prompting them with a full human codebook and labeled examples. It evaluates GPT-3, GPT-4, and open-source LLMs on a Swedish case study of fatherhood roles in policy documents, showing that multi-task prompting with exhaustive label descriptions can reach or exceed human coder performance while significantly reducing time and cost. The results indicate GPT-4 excels on complex tasks, while open-source models can be viable on simpler tasks, and joint three-task coding is considerably cheaper than separate runs. The authors provide open-source tooling and a detailed appendix to guide replication and practical deployment in large-scale political text annotation.

Abstract

Recent advances in large language models (LLMs) like GPT-3.5 and GPT-4 promise automation with better results and less programming, opening up new opportunities for text analysis in political science. In this study, we evaluate LLMs on three original coding tasks involving typical complexities encountered in political science settings: a non-English language, legal and political jargon, and complex labels based on abstract constructs. Along the paper, we propose a practical workflow to optimize the choice of the model and the prompt. We find that the best prompting strategy consists of providing the LLMs with a detailed codebook, as the one provided to human coders. In this setting, an LLM can be as good as or possibly better than a human annotator while being much faster, considerably cheaper, and much easier to scale to large amounts of text. We also provide a comparison of GPT and popular open-source LLMs, discussing the trade-offs in the model's choice. Our software allows LLMs to be easily used as annotators and is publicly available: https://github.com/lorelupo/pappa.

Towards Human-Level Text Coding with LLMs: The Case of Fatherhood Roles in Public Policy Documents

TL;DR

Abstract

Paper Structure (41 sections, 1 equation, 3 figures, 5 tables)

This paper contains 41 sections, 1 equation, 3 figures, 5 tables.

Introduction
The Example Case Study
Data
Methodology
Coding paradigm
Task 1 - Type of paternal involvement
Task 2 - Explicitness of the description
Task 3 - Normativeness of the description
Validation set
Prompt
Language Models
Prompt engineering
Tasks
Label descriptions
Number of example sentences
...and 26 more sections

Figures (3)

Figure 1: Average coding agreement with (other) human coders on task 1: type of paternal involvement (6 labels).
Figure 2: Average coding agreement with (other) human coders on task 2: explicitness of the description (2 labels).
Figure 3: Average coding agreement with (other) human coders on task 3: normativeness of the description (2 labels).

Towards Human-Level Text Coding with LLMs: The Case of Fatherhood Roles in Public Policy Documents

TL;DR

Abstract

Towards Human-Level Text Coding with LLMs: The Case of Fatherhood Roles in Public Policy Documents

Authors

TL;DR

Abstract

Table of Contents

Figures (3)