Machine-assisted quantitizing designs: augmenting humanities and social sciences with artificial intelligence

Andres Karjus

Machine-assisted quantitizing designs: augmenting humanities and social sciences with artificial intelligence

Andres Karjus

TL;DR

A systematic framework is argued for here, building on mixed methods quantitizing and converting design principles, and feature analysis from linguistics, to transparently integrate human expertise and machine scalability in large language models.

Abstract

The increasing capacities of large language models (LLMs) have been shown to present an unprecedented opportunity to scale up data analytics in the humanities and social sciences, by automating complex qualitative tasks otherwise typically carried out by human researchers. While numerous benchmarking studies have assessed the analytic prowess of LLMs, there is less focus on operationalizing this capacity for inference and hypothesis testing. Addressing this challenge, a systematic framework is argued for here, building on mixed methods quantitizing and converting design principles, and feature analysis from linguistics, to transparently integrate human expertise and machine scalability. Replicability and statistical robustness are discussed, including how to incorporate machine annotator error rates in subsequent inference. The approach is discussed and demonstrated in over a dozen LLM-assisted case studies, covering 9 diverse languages, multiple disciplines and tasks, including analysis of themes, stances, ideas, and genre compositions; linguistic and semantic annotation, interviews, text mining and event cause inference in noisy historical data, literary social network construction, metadata imputation, and multimodal visual cultural analytics. Using hypothesis-driven topic classification instead of "distant reading" is discussed. The replications among the experiments also illustrate how tasks previously requiring protracted team effort or complex computational pipelines can now be accomplished by an LLM-assisted scholar in a fraction of the time. Importantly, the approach is not intended to replace, but to augment and scale researcher expertise and analytic practices. With these opportunities in sight, qualitative skills and the ability to pose insightful questions have arguably never been more critical.

Machine-assisted quantitizing designs: augmenting humanities and social sciences with artificial intelligence

TL;DR

Abstract

Paper Structure (45 sections, 1 equation, 9 figures, 1 table)

This paper contains 45 sections, 1 equation, 9 figures, 1 table.

Introduction
Related zero-shot LLM applicability research
Related feature analytic and mixed methods approaches
A machine-assisted quantitizing design (MAQD)
What this contribution does
What this contribution is not: three disclaimers
Method details and statistical considerations
Conceptual method comparison
Necessity of statistical modeling to avoid unintentional quasi-quantifying designs
Incorporating classification error in statistical modeling
Results of case studies
Example case: topic classification instead of latent topic modeling
Examples of historical event cause analysis and missing data augmentation
Example of relevance classification with LLM-driven OCR correction in digitized newspaper corpora
Linguistic usage feature analysis applications
...and 30 more sections

Figures (9)

Figure 1: A typical QD pipeline. Qualitative elements are outlined in yellow, quantitative in blue. Steps where machine assistance (ML, LLMs, or otherwise) may be applied are in bold, including the quantitization step. Annotating a smaller additional test set is optional but strongly recommended if using either multiple human annotators or a machine.
Figure 2: A bootstrapping-driven pipeline for estimating the uncertainty in a machine-annotated categorical data variable $V$. The crucial component is the test set for comparing human expert annotation (ground truth) and machine predictions (or that of human coders). This provides an estimate of annotator accuracy and class confusion within the variable, which can then be used in bootstrapping the confidence intervals for the statistic of interest.
Figure 3: (A) Zero-shot prediction of predefined topics in the corpus of Soviet newsreel synopses. Vertical axis shows yearly aggregate percentages. Bootstrapped confidence intervals are added to the trend of the Social topic. There are less data in the latter years, reflected in the wider intervals. (B) Wrecking causes of ships found in the Baltic Sea, mostly in Estonian waters, as annotated by experts based on field notes and historical documents (left), compared to zero-shot prediction of said categories based on the same data, with bootstrapped confidence intervals on the counts. Due to fairly good classification accuracy, the counts end up roughly similar.
Figure 4: Social networks of interacting characters in "Les Misérables" by Victor Hugo, manually constructed textbook example on the left (A), and as automatically inferred using LLM on the right (B; men are blue and women orange).
Figure 5: Zero-shot classification of genre across one book and its film adaption, split into equally-sized segments and scenes, respectively. Frames from the film are added for illustration. Differences and similarities become readily apparent, and can provide basis for follow-up qualitative or quantitative comparisons.
...and 4 more figures

Machine-assisted quantitizing designs: augmenting humanities and social sciences with artificial intelligence

TL;DR

Abstract

Machine-assisted quantitizing designs: augmenting humanities and social sciences with artificial intelligence

Authors

TL;DR

Abstract

Table of Contents

Figures (9)