FairPy: A Toolkit for Evaluation of Prediction Biases and their Mitigation in Large Language Models

Hrishikesh Viswanath; Tianyi Zhang

FairPy: A Toolkit for Evaluation of Prediction Biases and their Mitigation in Large Language Models

Hrishikesh Viswanath, Tianyi Zhang

TL;DR

FairPy provides a general, plug-and-play framework to evaluate and mitigate token prediction biases in large language models, addressing model-architecture diversity and reproducibility. It decouples bias metrics from specific models and offers modular implementations of both detection (e.g., Hellinger Distance, WEAT/SEAT, StereoSet, Honest Score, Log Likelihood) and mitigation (e.g., DiffPruning, Null Space Projection, CDA, Self Debias, Dropout) techniques. The work surveys existing bias measurement approaches, demonstrates a practical, extensible toolkit, and presents empirical analysis highlighting dataset and template dependencies, guiding safer deployment of LMs. Overall, Fairpy aims to standardize bias evaluation in NLP pipelines and accelerate robust debiasing practices across diverse language models and applications.

Abstract

Recent studies have demonstrated that large pretrained language models (LLMs) such as BERT and GPT-2 exhibit biases in token prediction, often inherited from the data distributions present in their training corpora. In response, a number of mathematical frameworks have been proposed to quantify, identify, and mitigate these the likelihood of biased token predictions. In this paper, we present a comprehensive survey of such techniques tailored towards widely used LLMs such as BERT, GPT-2, etc. We additionally introduce Fairpy, a modular and extensible toolkit that provides plug-and-play interfaces for integrating these mathematical tools, enabling users to evaluate both pretrained and custom language models. Fairpy supports the implementation of existing debiasing algorithms. The toolkit is open-source and publicly available at: \href{https://github.com/HrishikeshVish/Fairpy}{https://github.com/HrishikeshVish/Fairpy}

FairPy: A Toolkit for Evaluation of Prediction Biases and their Mitigation in Large Language Models

TL;DR

Abstract

FairPy: A Toolkit for Evaluation of Prediction Biases and their Mitigation in Large Language Models

Authors

TL;DR

Abstract

Table of Contents

Figures (1)