Table of Contents
Fetching ...

Automating Transparency Mechanisms in the Judicial System Using LLMs: Opportunities and Challenges

Ishana Shastri, Shomik Jain, Barbara Engelhardt, Ashia Wilson

TL;DR

The work addresses the challenge of making judicial processes more transparent by leveraging large language models to extract structured signals from unstructured court documents. It analyzes two case studies—jury selection in criminal trials and eviction proceedings—to evaluate LLM capabilities in information extraction, while highlighting limitations in accuracy and inference requirements. The results show heterogeneous performance (e.g., 81.6% for selected juror names, 3.6% for gender composition, 95.8% zipcode), underscoring the need for targeted technical and legal investments, standardized data, and human oversight. The study provides a roadmap for improving data accessibility, pre-processing, and evaluation benchmarks to responsibly scale automated transparency in the judiciary and reduce potential disparities.

Abstract

Bringing more transparency to the judicial system for the purposes of increasing accountability often demands extensive effort from auditors who must meticulously sift through numerous disorganized legal case files to detect patterns of bias and errors. For example, the high-profile investigation into the Curtis Flowers case took seven reporters a full year to assemble evidence about the prosecutor's history of selecting racially biased juries. LLMs have the potential to automate and scale these transparency pipelines, especially given their demonstrated capabilities to extract information from unstructured documents. We discuss the opportunities and challenges of using LLMs to provide transparency in two important court processes: jury selection in criminal trials and housing eviction cases.

Automating Transparency Mechanisms in the Judicial System Using LLMs: Opportunities and Challenges

TL;DR

The work addresses the challenge of making judicial processes more transparent by leveraging large language models to extract structured signals from unstructured court documents. It analyzes two case studies—jury selection in criminal trials and eviction proceedings—to evaluate LLM capabilities in information extraction, while highlighting limitations in accuracy and inference requirements. The results show heterogeneous performance (e.g., 81.6% for selected juror names, 3.6% for gender composition, 95.8% zipcode), underscoring the need for targeted technical and legal investments, standardized data, and human oversight. The study provides a roadmap for improving data accessibility, pre-processing, and evaluation benchmarks to responsibly scale automated transparency in the judiciary and reduce potential disparities.

Abstract

Bringing more transparency to the judicial system for the purposes of increasing accountability often demands extensive effort from auditors who must meticulously sift through numerous disorganized legal case files to detect patterns of bias and errors. For example, the high-profile investigation into the Curtis Flowers case took seven reporters a full year to assemble evidence about the prosecutor's history of selecting racially biased juries. LLMs have the potential to automate and scale these transparency pipelines, especially given their demonstrated capabilities to extract information from unstructured documents. We discuss the opportunities and challenges of using LLMs to provide transparency in two important court processes: jury selection in criminal trials and housing eviction cases.
Paper Structure (45 sections, 5 figures, 9 tables)

This paper contains 45 sections, 5 figures, 9 tables.

Figures (5)

  • Figure 1: Example of a jury selection voir dire transcript excerpt. We extracted these excerpts of the final jury roll call in order to improve performance on the tasks of extracting selected juror names and determining the jury's gender composition. The highlighted text is a disfluency that causes the model to miscount jurors.
  • Figure 2: Absolute error for the jury gender composition task across different technical interventions. Error bars represent the standard error over all iterations.
  • Figure 3: Example strike sheets showing the variance in note-taking that occurs to document juror demographics and strike status. Common demarcations include 'W'/'B' for race, 'F'/'M' for gender, SX/DX for state and defense strikes, and 'C' for for-cause strikes.
  • Figure 4: Example Summary Process Summons and Complaint issued by the landlord to call the tenant to court and inform them of the grounds of eviction.
  • Figure 5: Example docket entry page including the final disposition (Agreement for Judgement) of an eviction case. The variability in handwriting and format of this page makes it difficult to automatically extract information.