Learning Fair Representations with Kolmogorov-Arnold Networks
Amisha Priyadarshini, Sergio Gago-Masague
TL;DR
This work tackles fairness in high-stakes college admissions by integrating Kolmogorov-Arnold Networks (KANs) into an adversarial debiasing framework to produce fair, interpretable representations. The authors prove that KANs are Lipschitz and $\beta$-smooth, enabling stable adversarial optimization, and they introduce an adaptive penalty mechanism to balance fairness and accuracy during training. Empirical results on two real-world admissions datasets show that KAN-based debiasing with adaptive $\lambda$ consistently improves fairness metrics (Demographic Parity and $p\%$-Rule) while preserving or enhancing predictive performance relative to state-of-the-art baselines, with ADOPT often delivering the best trade-off. The work highlights the practical potential of spline-based, interpretable architectures for fairness-aware decision-making and suggests future directions for feature-level bias detection and broader fairness definitions.
Abstract
Despite recent advances in fairness-aware machine learning, predictive models often exhibit discriminatory behavior towards marginalized groups. Such unfairness might arise from biased training data, model design, or representational disparities across groups, posing significant challenges in high-stakes decision-making domains such as college admissions. While existing fair learning models aim to mitigate bias, achieving an optimal trade-off between fairness and accuracy remains a challenge. Moreover, the reliance on black-box models hinders interpretability, limiting their applicability in socially sensitive domains. To circumvent these issues, we propose integrating Kolmogorov-Arnold Networks (KANs) within a fair adversarial learning framework. Leveraging the adversarial robustness and interpretability of KANs, our approach facilitates stable adversarial learning. We derive theoretical insights into the spline-based KAN architecture that ensure stability during adversarial optimization. Additionally, an adaptive fairness penalty update mechanism is proposed to strike a balance between fairness and accuracy. We back these findings with empirical evidence on two real-world admissions datasets, demonstrating the proposed framework's efficiency in achieving fairness across sensitive attributes while preserving predictive performance.
