Automate or Assist? The Role of Computational Models in Identifying Gendered Discourse in US Capital Trial Transcripts
Andrea W Wen-Yi, Kathryn Adamson, Nathalie Greenfield, Rachel Goldberg, Sandra Babcock, David Mimno, Allison Koenecke
TL;DR
The paper tackles the challenge of identifying gender-biased language in lengthy US capital-trial transcripts, where bias is subtle and requires legal expertise. It presents a three-phase approach that combines Critical Discourse Analysis by legal experts, finetuned in-domain NLP (LEGAL-BERT) with context-aware annotation, and a synthesis phase where expert judgments are used to evaluate and refine model-assisted annotations. Findings show that computational tools are valuable for surfacing relevant passages and provoking discussion, but cannot fully replace expert judgment in complex, nuanced tasks; the study highlights the importance of recall-focused design and collaborative workflows. Taken together, the work offers practical guidelines for integrating computational methods into high-stakes legal annotation pipelines to scale analysis while preserving expert consensus and accountability.
Abstract
The language used by US courtroom actors in criminal trials has long been studied for biases. However, systematic studies for bias in high-stakes court trials have been difficult, due to the nuanced nature of bias and the legal expertise required. Large language models offer the possibility to automate annotation. But validating the computational approach requires both an understanding of how automated methods fit in existing annotation workflows and what they really offer. We present a case study of adding a computational model to a complex and high-stakes problem: identifying gender-biased language in US capital trials for women defendants. Our team of experienced death-penalty lawyers and NLP technologists pursue a three-phase study: first annotating manually, then training and evaluating computational models, and finally comparing expert annotations to model predictions. Unlike many typical NLP tasks, annotating for gender bias in months-long capital trials is complicated, with many individual judgment calls. Contrary to standard arguments for automation that are based on efficiency and scalability, legal experts find the computational models most useful in providing opportunities to reflect on their own bias in annotation and to build consensus on annotation rules. This experience suggests that seeking to replace experts with computational models for complex annotation is both unrealistic and undesirable. Rather, computational models offer valuable opportunities to assist the legal experts in annotation-based studies.
