Table of Contents
Fetching ...

On Classification with Large Language Models in Cultural Analytics

David Bamman, Kent K. Chang, Li Lucy, Naitian Zhou

TL;DR

It is found that prompt-based LLMs are competitive with traditional supervised models for established tasks, but perform less well on de novo tasks.

Abstract

In this work, we survey the way in which classification is used as a sensemaking practice in cultural analytics, and assess where large language models can fit into this landscape. We identify ten tasks supported by publicly available datasets on which we empirically assess the performance of LLMs compared to traditional supervised methods, and explore the ways in which LLMs can be employed for sensemaking goals beyond mere accuracy. We find that prompt-based LLMs are competitive with traditional supervised models for established tasks, but perform less well on de novo tasks. In addition, LLMs can assist sensemaking by acting as an intermediary input to formal theory testing.

On Classification with Large Language Models in Cultural Analytics

TL;DR

It is found that prompt-based LLMs are competitive with traditional supervised models for established tasks, but perform less well on de novo tasks.

Abstract

In this work, we survey the way in which classification is used as a sensemaking practice in cultural analytics, and assess where large language models can fit into this landscape. We identify ten tasks supported by publicly available datasets on which we empirically assess the performance of LLMs compared to traditional supervised methods, and explore the ways in which LLMs can be employed for sensemaking goals beyond mere accuracy. We find that prompt-based LLMs are competitive with traditional supervised models for established tasks, but perform less well on de novo tasks. In addition, LLMs can assist sensemaking by acting as an intermediary input to formal theory testing.

Paper Structure

This paper contains 49 sections, 6 figures, 5 tables.

Figures (6)

  • Figure 1: Template for LLM sensemaking exercise, with anonymized integer labels, illustrated with data for the Strangeness task (0 = strange; 1 = not strange).
  • Figure 2: Searching for the text of a data point in the Folktales training set ("A father, now aged, had given over all his property to his children") brings up its label (982).
  • Figure 3: Template for LLM sensemaking exercise for regression task.
  • Figure 4: Learning rate sweep on development data for BERT across different tasks. The x axis is the $\log_{10}$ of the learning rate.
  • Figure 5: Learning rate sweep on development data for RoBERTa across different tasks. The x axis is the $\log_{10}$ of the learning rate.
  • ...and 1 more figures