Fairness Definitions in Language Models Explained

Zhipeng Yin; Zichong Wang; Avash Palikhe; Wenbin Zhang

Fairness Definitions in Language Models Explained

Zhipeng Yin, Zichong Wang, Avash Palikhe, Wenbin Zhang

TL;DR

This survey addresses the fragmentation in fairness definitions for language models by proposing an architecture-aware taxonomy that separates encoder-only, decoder-only, and encoder-decoder LMs. It systematically reviews intrinsic and extrinsic notions within each class and demonstrates these definitions through targeted experiments across diverse bias domains and benchmarks. The work contributes a comprehensive framework, explicit notations, and publicly available resources to reproduce and apply fairness concepts in practice, while outlining key challenges such as intersectionality and balancing fairness with knowledge quality. By clarifying how biases manifest across architectures and tasks, the paper provides a foundation for developing targeted, reproducible mitigation strategies with real-world impact in high-stakes NLP applications.

Abstract

Language Models (LMs) have demonstrated exceptional performance across various Natural Language Processing (NLP) tasks. Despite these advancements, LMs can inherit and amplify societal biases related to sensitive attributes such as gender and race, limiting their adoption in real-world applications. Therefore, fairness has been extensively explored in LMs, leading to the proposal of various fairness notions. However, the lack of clear agreement on which fairness definition to apply in specific contexts and the complexity of understanding the distinctions between these definitions can create confusion and impede further progress. To this end, this paper proposes a systematic survey that clarifies the definitions of fairness as they apply to LMs. Specifically, we begin with a brief introduction to LMs and fairness in LMs, followed by a comprehensive, up-to-date overview of existing fairness notions in LMs and the introduction of a novel taxonomy that categorizes these concepts based on their transformer architecture: encoder-only, decoder-only, and encoder-decoder LMs. We further illustrate each definition through experiments, showcasing their practical implications and outcomes. Finally, we discuss current research challenges and open questions, aiming to foster innovative ideas and advance the field. The repository is publicly available online at https://github.com/vanbanTruong/Fairness-in-Large-Language-Models/tree/main/definitions.

Fairness Definitions in Language Models Explained

TL;DR

Abstract

Paper Structure (33 sections, 11 equations, 18 figures, 14 tables)

This paper contains 33 sections, 11 equations, 18 figures, 14 tables.

Introduction
Taxonomy
Background, Notations and Experimental setup
Language Models
Notations
Experimental setup
Fairness definitions for encoder-only language models
Intrinsic bias for encoder-only LMs
Similarity-based disparity
Probability-based disparity
Extrinsic bias for encoder-only LMs
Equal Opportunity
Fair Inference
Context-based disparity
Fairness definitions for decoder-only language models
...and 18 more sections

Figures (18)

Figure 1: An overview of the proposed taxonomy of fairness definitions in language models.
Figure 2: An example of similarity-based bias in encoder-only LMs.
Figure 3: An example of probability-based bias with masked token metrics in encoder-only LMs.
Figure 4: An example of probability-based bias with pseudo-log-likelihood metrics in encoder-only LMs.
Figure 5: An example of the extrinsic bias of encoder-only LMs in classification task.
...and 13 more figures

Fairness Definitions in Language Models Explained

TL;DR

Abstract

Fairness Definitions in Language Models Explained

Authors

TL;DR

Abstract

Table of Contents

Figures (18)