Inducing Group Fairness in Prompt-Based Language Model Decisions

James Atwood; Nino Scherrer; Preethi Lahoti; Ananth Balashankar; Flavien Prost; Ahmad Beirami

Inducing Group Fairness in Prompt-Based Language Model Decisions

James Atwood, Nino Scherrer, Preethi Lahoti, Ananth Balashankar, Flavien Prost, Ahmad Beirami

TL;DR

The paper investigates equal opportunity fairness in two LM-based classification paradigms: prompt-based and embedding-based, finding significant group disparities in false positive rates across religious and other demographic groups. It adapts three remediation families—prompting, in-processing with a Min Diff/MMD regularizer, and post-processing with an emfairening head—and evaluates them on the Civil Comments Identity dataset. Embedding-based classifiers generally outperform prompt-based ones, with in-processing providing the strongest fairness–performance tradeoffs; prompting offers limited improvements. Post-processing shows transferability to unseen models, highlighting the potential for universal fairness heads, while prompting methods are less controllable. Overall, the work emphasizes the need for LM-structure-aware remediation and provides practical guidance on when to apply in-processing versus post-processing for fair LM-based decision making.

Abstract

Classifiers are used throughout industry to enforce policies, ranging from the detection of toxic content to age-appropriate content filtering. While these classifiers serve important functions, it is also essential that they are built in ways that minimize unfair biases for users. One such fairness consideration is called group fairness, which desires that different sub-population of users receive equal treatment. This is a well-studied problem in the context of 'classical' classifiers. However, the emergence of prompt-based language model (LM) decision making has created new opportunities to solve text-based classification tasks, and the fairness properties of these new classifiers are not yet well understood. Further, the `remediation toolkit' is incomplete for LM-based decision makers and little is understood about how to improve decision maker group fairness while maintaining classifier performance. This work sets out to add more tools to that toolbox. We introduce adaptations of existing effective approaches from the classical classifier fairness to the prompt-based classifier space. We also devise simple methods that take advantage of the new structure of prompt-based decision makers and operate at the prompt level. We compare these approaches empirically on real data. Our results suggest that adaptations of approaches that are effective for classical classifiers remain effective in the LM-based classifier environment. However, there is room for further exploration of prompt-based remediation methods (and other remediation methods that take advantage of LM structure).

Inducing Group Fairness in Prompt-Based Language Model Decisions

TL;DR

Abstract

Inducing Group Fairness in Prompt-Based Language Model Decisions

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (2)