Fairness in Large Language Models: A Taxonomic Survey
Zhibo Chu, Zichong Wang, Wenbin Zhang
TL;DR
Fairness in Large Language Models: A Taxonomic Survey addresses the risk of bias and discrimination in LLMs and proposes a unified taxonomy of fairness notions, evaluation metrics, mitigation algorithms, and evaluation resources. It analyzes three complementary perspectives—metrics for quantifying bias (embedding-, probability-, generation-based), mitigation techniques across pre-, in-, intra-, and post-processing stages, and public datasets/toolkits for bias assessment—grounded in the ML and linguistic context. It identifies training data, embedding, and label biases as primary sources, and surveys concrete methods from counterfactual data augmentation to adapter-based debiasing and decoding-time interventions. The work provides actionable guidance for researchers and practitioners to rigorously evaluate bias, select appropriate debiasing techniques, and navigate open challenges in achieving fair LLMs.
Abstract
Large Language Models (LLMs) have demonstrated remarkable success across various domains. However, despite their promising performance in numerous real-world applications, most of these algorithms lack fairness considerations. Consequently, they may lead to discriminatory outcomes against certain communities, particularly marginalized populations, prompting extensive study in fair LLMs. On the other hand, fairness in LLMs, in contrast to fairness in traditional machine learning, entails exclusive backgrounds, taxonomies, and fulfillment techniques. To this end, this survey presents a comprehensive overview of recent advances in the existing literature concerning fair LLMs. Specifically, a brief introduction to LLMs is provided, followed by an analysis of factors contributing to bias in LLMs. Additionally, the concept of fairness in LLMs is discussed categorically, summarizing metrics for evaluating bias in LLMs and existing algorithms for promoting fairness. Furthermore, resources for evaluating bias in LLMs, including toolkits and datasets, are summarized. Finally, existing research challenges and open questions are discussed.
