Fair Mixed Effects Support Vector Machine
Jan Pablo Burgard, João Vitor Pamplona
TL;DR
The paper addresses fair binary classification in the presence of cluster-correlated data by introducing Fair Mixed Effects SVM (FMESVM). It combines a disparate impact constraint with mixed-effects modeling, introducing group-specific random effects $g_i$ penalized by $\lambda\sum_i g_i^2$ and margins $m_{\beta,g}^{SVM}(x^{ij})$, forming an optimization that balances accuracy, fairness, and variance due to random effects. Through simulated experiments and a real-world Adult dataset application, the authors demonstrate that FMESVM and its fair variant MESVM reduce disparate impact with manageable losses in accuracy compared to standard SVM, and that one-hot encoding is less efficient than the proposed approach. The work advances fairness-aware ML for clustered data, enabling more ethical automated decisions in contexts where observations are not independently sampled.
Abstract
To ensure unbiased and ethical automated predictions, fairness must be a core principle in machine learning applications. Fairness in machine learning aims to mitigate biases present in the training data and model imperfections that could lead to discriminatory outcomes. This is achieved by preventing the model from making decisions based on sensitive characteristics like ethnicity or sexual orientation. A fundamental assumption in machine learning is the independence of observations. However, this assumption often does not hold true for data describing social phenomena, where data points are often clustered based. Hence, if the machine learning models do not account for the cluster correlations, the results may be biased. Especially high is the bias in cases where the cluster assignment is correlated to the variable of interest. We present a fair mixed effects support vector machine algorithm that can handle both problems simultaneously. With a reproducible simulation study we demonstrate the impact of clustered data on the quality of fair machine learning predictions.
