Table of Contents
Fetching ...

Identifying Bias in Machine-generated Text Detection

Kevin Stowe, Svetlana Afanaseva, Rodolfo Raimundo, Yitao Sun, Kailash Patil

TL;DR

This study systematically investigates bias in machine-generated text detection across four sensitive attributes (gender, race/ethnicity, ELL status, and economic status) using 16 detectors and a combined cohort of student essays. It applies a logistic regression framework with confounders to quantify bias via performance differences, coefficients, and dominance scores, supplemented by subgroup analyses and human annotations. The findings indicate that ELL status consistently increases misclassification as machine-generated, with non-White ELL essays particularly affected; economic-status effects are mixed and race/gender biases are not universally present. Human annotators perform near chance and exhibit no clear attribute biases, underscoring the need for bias-aware deployment and carefully chosen evaluation datasets and metrics to mitigate harms in real-world applications.

Abstract

The meteoric rise in text generation capability has been accompanied by parallel growth in interest in machine-generated text detection: the capability to identify whether a given text was generated using a model or written by a person. While detection models show strong performance, they have the capacity to cause significant negative impacts. We explore potential biases in English machine-generated text detection systems. We curate a dataset of student essays and assess 16 different detection systems for bias across four attributes: gender, race/ethnicity, English-language learner (ELL) status, and economic status. We evaluate these attributes using regression-based models to determine the significance and power of the effects, as well as performing subgroup analysis. We find that while biases are generally inconsistent across systems, there are several key issues: several models tend to classify disadvantaged groups as machine-generated, ELL essays are more likely to be classified as machine-generated, economically disadvantaged students' essays are less likely to be classified as machine-generated, and non-White ELL essays are disproportionately classified as machine-generated relative to their White counterparts. Finally, we perform human annotation and find that while humans perform generally poorly at the detection task, they show no significant biases on the studied attributes.

Identifying Bias in Machine-generated Text Detection

TL;DR

This study systematically investigates bias in machine-generated text detection across four sensitive attributes (gender, race/ethnicity, ELL status, and economic status) using 16 detectors and a combined cohort of student essays. It applies a logistic regression framework with confounders to quantify bias via performance differences, coefficients, and dominance scores, supplemented by subgroup analyses and human annotations. The findings indicate that ELL status consistently increases misclassification as machine-generated, with non-White ELL essays particularly affected; economic-status effects are mixed and race/gender biases are not universally present. Human annotators perform near chance and exhibit no clear attribute biases, underscoring the need for bias-aware deployment and carefully chosen evaluation datasets and metrics to mitigate harms in real-world applications.

Abstract

The meteoric rise in text generation capability has been accompanied by parallel growth in interest in machine-generated text detection: the capability to identify whether a given text was generated using a model or written by a person. While detection models show strong performance, they have the capacity to cause significant negative impacts. We explore potential biases in English machine-generated text detection systems. We curate a dataset of student essays and assess 16 different detection systems for bias across four attributes: gender, race/ethnicity, English-language learner (ELL) status, and economic status. We evaluate these attributes using regression-based models to determine the significance and power of the effects, as well as performing subgroup analysis. We find that while biases are generally inconsistent across systems, there are several key issues: several models tend to classify disadvantaged groups as machine-generated, ELL essays are more likely to be classified as machine-generated, economically disadvantaged students' essays are less likely to be classified as machine-generated, and non-White ELL essays are disproportionately classified as machine-generated relative to their White counterparts. Finally, we perform human annotation and find that while humans perform generally poorly at the detection task, they show no significant biases on the studied attributes.

Paper Structure

This paper contains 48 sections, 2 figures, 7 tables.

Figures (2)

  • Figure 1: Pseudo-$R^2$ values from the regression analysis plotted against AUROC scores for each model.
  • Figure 2: Pearson correlation for the predictions for each model.