FairPy: A Toolkit for Evaluation of Prediction Biases and their Mitigation in Large Language Models
Hrishikesh Viswanath, Tianyi Zhang
TL;DR
FairPy provides a general, plug-and-play framework to evaluate and mitigate token prediction biases in large language models, addressing model-architecture diversity and reproducibility. It decouples bias metrics from specific models and offers modular implementations of both detection (e.g., Hellinger Distance, WEAT/SEAT, StereoSet, Honest Score, Log Likelihood) and mitigation (e.g., DiffPruning, Null Space Projection, CDA, Self Debias, Dropout) techniques. The work surveys existing bias measurement approaches, demonstrates a practical, extensible toolkit, and presents empirical analysis highlighting dataset and template dependencies, guiding safer deployment of LMs. Overall, Fairpy aims to standardize bias evaluation in NLP pipelines and accelerate robust debiasing practices across diverse language models and applications.
Abstract
Recent studies have demonstrated that large pretrained language models (LLMs) such as BERT and GPT-2 exhibit biases in token prediction, often inherited from the data distributions present in their training corpora. In response, a number of mathematical frameworks have been proposed to quantify, identify, and mitigate these the likelihood of biased token predictions. In this paper, we present a comprehensive survey of such techniques tailored towards widely used LLMs such as BERT, GPT-2, etc. We additionally introduce Fairpy, a modular and extensible toolkit that provides plug-and-play interfaces for integrating these mathematical tools, enabling users to evaluate both pretrained and custom language models. Fairpy supports the implementation of existing debiasing algorithms. The toolkit is open-source and publicly available at: \href{https://github.com/HrishikeshVish/Fairpy}{https://github.com/HrishikeshVish/Fairpy}
