Fairness in Large Language Models in Three Hours

Thang Doan Viet; Zichong Wang; Minh Nhat Nguyen; Wenbin Zhang

Fairness in Large Language Models in Three Hours

Thang Doan Viet, Zichong Wang, Minh Nhat Nguyen, Wenbin Zhang

TL;DR

The paper surveys fairness in large language models by outlining bias sources, fairness notions, evaluation methodologies, and mitigation strategies, framing a structured, interactive tutorial to advance practical fairness. It details a half-day agenda with case studies, discussions, and hands-on exposure to toolkits and datasets, complemented by post-tutorial materials. Key contributions include a taxonomy of bias sources—training data, embeddings, and labels—along with mitigation stages (pre-, in-, intra-, post-processing) and a catalog of evaluation resources and datasets. Overall, the work aims to equip researchers and practitioners with concrete guidance to reduce discriminatory outcomes in LLMs and to catalyze interdisciplinary efforts toward safer, fairer NLP systems.

Abstract

Large Language Models (LLMs) have demonstrated remarkable success across various domains but often lack fairness considerations, potentially leading to discriminatory outcomes against marginalized populations. Unlike fairness in traditional machine learning, fairness in LLMs involves unique backgrounds, taxonomies, and fulfillment techniques. This tutorial provides a systematic overview of recent advances in the literature concerning fair LLMs, beginning with real-world case studies to introduce LLMs, followed by an analysis of bias causes therein. The concept of fairness in LLMs is then explored, summarizing the strategies for evaluating bias and the algorithms designed to promote fairness. Additionally, resources for assessing bias in LLMs, including toolkits and datasets, are compiled, and current research challenges and open questions in the field are discussed. The repository is available at \url{https://github.com/LavinWong/Fairness-in-Large-Language-Models}.

Fairness in Large Language Models in Three Hours

TL;DR

Abstract

Fairness in Large Language Models in Three Hours

Authors

TL;DR

Abstract

Table of Contents