Uncovering Fairness through Data Complexity as an Early Indicator
Juliett Suárez Ferreira, Marija Slavkovik, Jorge Casillas
TL;DR
This study investigates whether differences in subgroup data complexity between privileged and unprivileged groups can serve as early indicators of algorithmic unfairness. It combines a large-scale synthetic bias framework (73 datasets) with three classifiers (LR, DT, KN) and association-rule mining to link complexity gaps to SP, EO, and PP, complemented by validation on 30 real-world datasets. The authors identify consistent patterns—especially class-imbalance (C2), boundary-overlap (N1), and local density (density)—that correlate with fairness violations and demonstrate that complexity-difference signals can guide pre-processing and model choices to mitigate bias. The findings advocate for routine complexity audits in the ML pipeline, offering data-centric indicators that help practitioners anticipate and address fairness challenges in practice. $CMD = |complexity extsubscript{privileged} - complexity extsubscript{unprivileged}|$ and the fair interval is $[-0.1, 0.1]$.
Abstract
Fairness constitutes a concern within machine learning (ML) applications. Currently, there is no study on how disparities in classification complexity between privileged and unprivileged groups could influence the fairness of solutions, which serves as a preliminary indicator of potential unfairness. In this work, we investigate this gap, specifically, we focus on synthetic datasets designed to capture a variety of biases ranging from historical bias to measurement and representational bias to evaluate how various complexity metrics differences correlate with group fairness metrics. We then apply association rule mining to identify patterns that link disproportionate complexity differences between groups with fairness-related outcomes, offering data-centric indicators to guide bias mitigation. Our findings are also validated by their application in real-world problems, providing evidence that quantifying group-wise classification complexity can uncover early indicators of potential fairness challenges. This investigation helps practitioners to proactively address bias in classification tasks.
