Reducing Large Language Model Bias with Emphasis on 'Restricted Industries': Automated Dataset Augmentation and Prejudice Quantification

Devam Mondal; Carlo Lipizzi

Reducing Large Language Model Bias with Emphasis on 'Restricted Industries': Automated Dataset Augmentation and Prejudice Quantification

Devam Mondal, Carlo Lipizzi

TL;DR

A novel, automated mechanism for debiasing through specified dataset augmentation in the lens of bias producers and in the context of 'restricted industries' with limited data is proposed.

Abstract

Despite the growing capabilities of large language models, there exists concerns about the biases they develop. In this paper, we propose a novel, automated mechanism for debiasing through specified dataset augmentation in the lens of bias producers and in the context of 'restricted industries' with limited data. We additionally create two new additional metrics, the mb-index and db-index, to quantify bias, considering the idea that bias occurs due to both intrinsic model architecture and dataset.

Reducing Large Language Model Bias with Emphasis on 'Restricted Industries': Automated Dataset Augmentation and Prejudice Quantification

TL;DR

A novel, automated mechanism for debiasing through specified dataset augmentation in the lens of bias producers and in the context of 'restricted industries' with limited data is proposed.

Abstract

Paper Structure (15 sections, 6 equations, 2 figures, 2 tables)

This paper contains 15 sections, 6 equations, 2 figures, 2 tables.

Introduction
Literature Review
Cluster 1: Types and Examples of Bias in the Realm of LLMs
Cluster 2: Bias in Application of LLMs in Restricted Industries
Cluster 3: Dataset Bias in the Realm of LLMs
Cluster 4: Inherent Bias in LLM Architectures
Addressing and Remediating Bias
Approach
Dataset Augmentation
LLM Bias Classification
Dataset Bias Classification
Method
Results and Discussion
Limitations and Future Research Directions
Conclusion

Figures (2)

Figure 1: Obtaining semantically homogenous clusters through k-means clustering and hyperparameter grid search.
Figure 2: Obtaining semantically homogenous clusters through k-means clustering and hyperparameter grid search.

Reducing Large Language Model Bias with Emphasis on 'Restricted Industries': Automated Dataset Augmentation and Prejudice Quantification

TL;DR

Abstract

Reducing Large Language Model Bias with Emphasis on 'Restricted Industries': Automated Dataset Augmentation and Prejudice Quantification

Authors

TL;DR

Abstract

Table of Contents

Figures (2)