Gender Bias Detection in Court Decisions: A Brazilian Case Study

Raysa Benatti; Fabiana Severi; Sandra Avila; Esther Luna Colombini

Gender Bias Detection in Court Decisions: A Brazilian Case Study

Raysa Benatti, Fabiana Severi, Sandra Avila, Esther Luna Colombini

TL;DR

This work addresses the problem of detecting gender bias in court decisions from Brazil using an attention-based NLP approach on Brazilian Portuguese texts. It introduces an experimental framework built around two domain-specific datasets (DVC and PAC) from the São Paulo state court system, with domain-expert annotation and a chunk-based text preprocessing strategy feeding a BERTimbau classifier. Key contributions include the data collection and annotation protocol, the datasets with rich metadata, and an evaluation pipeline that leverages data augmentation and two fine-tuning regimes to detect biased content at the decision level. Findings show that data augmentation and careful fine-tuning improve detection performance, though generalization is constrained by limited annotated data and the authors emphasize ethical considerations, reproducibility, and the essential role of domain expertise for responsible deployment. Overall, the paper offers a proof-of-concept for scalable diagnosis of institutional gender biases in court activity and highlights practical guidelines for data handling, domain involvement, and value-sensitive design in socially impactful NLP applications.

Abstract

Data derived from the realm of the social sciences is often produced in digital text form, which motivates its use as a source for natural language processing methods. Researchers and practitioners have developed and relied on artificial intelligence techniques to collect, process, and analyze documents in the legal field, especially for tasks such as text summarization and classification. While increasing procedural efficiency is often the primary motivation behind natural language processing in the field, several works have proposed solutions for human rights-related issues, such as assessment of public policy and institutional social settings. One such issue is the presence of gender biases in court decisions, which has been largely studied in social sciences fields; biased institutional responses to gender-based violence are a violation of international human rights dispositions since they prevent gender minorities from accessing rights and hamper their dignity. Natural language processing-based approaches can help detect these biases on a larger scale. Still, the development and use of such tools require researchers and practitioners to be mindful of legal and ethical aspects concerning data sharing and use, reproducibility, domain expertise, and value-charged choices. In this work, we (a) present an experimental framework developed to automatically detect gender biases in court decisions issued in Brazilian Portuguese and (b) describe and elaborate on features we identify to be critical in such a technology, given its proposed use as a support tool for research and assessment of court~activity.

Gender Bias Detection in Court Decisions: A Brazilian Case Study

TL;DR

Abstract

Paper Structure (24 sections, 2 figures, 6 tables)

This paper contains 24 sections, 2 figures, 6 tables.

Introduction
Institutional Gender Bias
Related work
Framework
Data
Data Annotation
Biases.
Data Preparation
Experimental Design
Data Augmentation
Model and Parameters
Evaluation and Validation Methods
Main Findings
Discussion
Data sharing and reproducibility.
...and 9 more sections

Figures (2)

Figure 1: High-level view on the methodology. It comprises three blocks: the first one, Data, includes collection, annotation, and preparation with cleaning and chunk extraction, generating Domestic Violence Cases (DVC) and Parental Alienation Cases (PAC) datasets; they are the input of the second block, Experiments, containing training of BERTimbau-based models for binary classification, with data augmentation and fine-tuning protocols. Finally, the third block, Validation, includes evaluation and testing.
Figure 2: A representation of the experimental pipeline. It starts with a JSON file, the annotated dataset, which is tokenized (encoded). The encoded texts are split and become the split dataset, made of portions for training, validation, and testing. The training set is augmented online and, along with the validation set, is fed into a supervised classification process; the test set is fed into a validation pipeline.

Gender Bias Detection in Court Decisions: A Brazilian Case Study

TL;DR

Abstract

Gender Bias Detection in Court Decisions: A Brazilian Case Study

Authors

TL;DR

Abstract

Table of Contents

Figures (2)