Comparing Methods for Bias Mitigation in Graph Neural Networks
Barbara Hoffmann, Ruben Mayer
TL;DR
This paper tackles bias in Graph Neural Network (GNN)–guided data preparation for GenAI by comparing three mitigation strategies on the German credit dataset: data sparsification, feature modification, and synthetic data augmentation. It demonstrates that stratified sampling provides the most balanced fairness improvements with negligible accuracy loss, while GraphSAGE-based augmentation significantly reduces demographic gaps while maintaining high accuracy, albeit with a notable rise in false positive rate disparity. Feature modification yields strong fairness gains but may have limited real-world applicability due to potential pattern leakage and trade-offs in accuracy. Overall, the work offers practical guidance for deploying fair GNN-enabled data preparation pipelines that preserve task performance.
Abstract
This paper examines the critical role of Graph Neural Networks (GNNs) in data preparation for generative artificial intelligence (GenAI) systems, with a particular focus on addressing and mitigating biases. We present a comparative analysis of three distinct methods for bias mitigation: data sparsification, feature modification, and synthetic data augmentation. Through experimental analysis using the german credit dataset, we evaluate these approaches using multiple fairness metrics, including statistical parity, equality of opportunity, and false positive rates. Our research demonstrates that while all methods improve fairness metrics compared to the original dataset, stratified sampling and synthetic data augmentation using GraphSAGE prove particularly effective in balancing demographic representation while maintaining model performance. The results provide practical insights for developing more equitable AI systems while maintaining model performance.
