Fair Mixed Effects Support Vector Machine

Jan Pablo Burgard; João Vitor Pamplona

Fair Mixed Effects Support Vector Machine

Jan Pablo Burgard, João Vitor Pamplona

TL;DR

The paper addresses fair binary classification in the presence of cluster-correlated data by introducing Fair Mixed Effects SVM (FMESVM). It combines a disparate impact constraint with mixed-effects modeling, introducing group-specific random effects $g_i$ penalized by $\lambda\sum_i g_i^2$ and margins $m_{\beta,g}^{SVM}(x^{ij})$, forming an optimization that balances accuracy, fairness, and variance due to random effects. Through simulated experiments and a real-world Adult dataset application, the authors demonstrate that FMESVM and its fair variant MESVM reduce disparate impact with manageable losses in accuracy compared to standard SVM, and that one-hot encoding is less efficient than the proposed approach. The work advances fairness-aware ML for clustered data, enabling more ethical automated decisions in contexts where observations are not independently sampled.

Abstract

To ensure unbiased and ethical automated predictions, fairness must be a core principle in machine learning applications. Fairness in machine learning aims to mitigate biases present in the training data and model imperfections that could lead to discriminatory outcomes. This is achieved by preventing the model from making decisions based on sensitive characteristics like ethnicity or sexual orientation. A fundamental assumption in machine learning is the independence of observations. However, this assumption often does not hold true for data describing social phenomena, where data points are often clustered based. Hence, if the machine learning models do not account for the cluster correlations, the results may be biased. Especially high is the bias in cases where the cluster assignment is correlated to the variable of interest. We present a fair mixed effects support vector machine algorithm that can handle both problems simultaneously. With a reproducible simulation study we demonstrate the impact of clustered data on the quality of fair machine learning predictions.

Fair Mixed Effects Support Vector Machine

TL;DR

penalized by

and margins

, forming an optimization that balances accuracy, fairness, and variance due to random effects. Through simulated experiments and a real-world Adult dataset application, the authors demonstrate that FMESVM and its fair variant MESVM reduce disparate impact with manageable losses in accuracy compared to standard SVM, and that one-hot encoding is less efficient than the proposed approach. The work advances fairness-aware ML for clustered data, enabling more ethical automated decisions in contexts where observations are not independently sampled.

Abstract

Paper Structure (6 sections, 19 equations, 8 figures)

This paper contains 6 sections, 19 equations, 8 figures.

Introduction
Fair Support Vector Machine
Fair Mixed Effects Support Vector Machine
Simulation Study
Application
Conclusion

Figures (8)

Figure 1: Regular SVM.
Figure 2: SVM free of disparate impact.
Figure 3: Memory comparison in Bytes.
Figure 4: Time comparison in Microsecond.
Figure 5: Accuracy.
...and 3 more figures

Fair Mixed Effects Support Vector Machine

TL;DR

Abstract

Fair Mixed Effects Support Vector Machine

Authors

TL;DR

Abstract

Table of Contents

Figures (8)