Table of Contents
Fetching ...

A Fair Post-Processing Method based on the MADD Metric for Predictive Student Models

Mélina Verger, Chunyang Fan, Sébastien Lallé, François Bouchet, Vanda Luengo

TL;DR

This work addresses fairness in predictive student models by introducing the Model Absolute Density Distance ($MADD$) as a fairness metric and proposing a post-processing method that adjusts predicted probabilities via a mapping controlled by a fairness coefficient $\lambda$. The method converges the group-specific density distributions toward a target distribution using a convex combination, while optimizing a joint objective $\mathcal{L}(\lambda)$ that balances accuracy and fairness. Experiments on simulated data and the Open University Learning Analytics Dataset (OULAD) demonstrate substantial fairness improvements with only modest losses in predictive accuracy, and the approach remains practical without requiring access to the original training data or model. The authors provide open-source code and data at GitHub, highlighting the method's potential for real-world deployment and future expansion to multiple sensitive attributes.

Abstract

Predictive student models are increasingly used in learning environments. However, due to the rising social impact of their usage, it is now all the more important for these models to be both sufficiently accurate and fair in their predictions. To evaluate algorithmic fairness, a new metric has been developed in education, namely the Model Absolute Density Distance (MADD). This metric enables us to measure how different a predictive model behaves regarding two groups of students, in order to quantify its algorithmic unfairness. In this paper, we thus develop a post-processing method based on this metric, that aims at improving the fairness while preserving the accuracy of relevant predictive models' results. We experiment with our approach on the task of predicting student success in an online course, using both simulated and real-world educational data, and obtain successful results. Our source code and data are in open access at https://github.com/melinaverger/MADD .

A Fair Post-Processing Method based on the MADD Metric for Predictive Student Models

TL;DR

This work addresses fairness in predictive student models by introducing the Model Absolute Density Distance () as a fairness metric and proposing a post-processing method that adjusts predicted probabilities via a mapping controlled by a fairness coefficient . The method converges the group-specific density distributions toward a target distribution using a convex combination, while optimizing a joint objective that balances accuracy and fairness. Experiments on simulated data and the Open University Learning Analytics Dataset (OULAD) demonstrate substantial fairness improvements with only modest losses in predictive accuracy, and the approach remains practical without requiring access to the original training data or model. The authors provide open-source code and data at GitHub, highlighting the method's potential for real-world deployment and future expansion to multiple sensitive attributes.

Abstract

Predictive student models are increasingly used in learning environments. However, due to the rising social impact of their usage, it is now all the more important for these models to be both sufficiently accurate and fair in their predictions. To evaluate algorithmic fairness, a new metric has been developed in education, namely the Model Absolute Density Distance (MADD). This metric enables us to measure how different a predictive model behaves regarding two groups of students, in order to quantify its algorithmic unfairness. In this paper, we thus develop a post-processing method based on this metric, that aims at improving the fairness while preserving the accuracy of relevant predictive models' results. We experiment with our approach on the task of predicting student success in an online course, using both simulated and real-world educational data, and obtain successful results. Our source code and data are in open access at https://github.com/melinaverger/MADD .
Paper Structure (22 sections, 2 theorems, 11 equations, 7 figures, 1 table)

This paper contains 22 sections, 2 theorems, 11 equations, 7 figures, 1 table.

Key Result

theorem thmcountertheorem

Let $\mathcal{A}$ be a distribution and $F_{\mathcal{A}}$ be the cumulative distribution function of that distribution. If $X$ obeys the distribution $\mathcal{A}$ i.e. $X \sim \mathcal{A}$, then $F_{\mathcal{A}}(X) \sim \mathcal{U}_{[0, 1]}$, where $\mathcal{U}_{[0,1]}$ is a uniform distribution ov

Figures (7)

  • Figure 1: Representations of the MADD from ref_madd. (a) Proportions of predicted probabilities for group $G_0$. (b) Idem for group $G_1$. (c) Visual approximation of the MADD in the red zone, thanks to a smoothing of the histograms (a) and (b) for easier interpretability. The smoothing has been done by kernel density estimation of the histograms ref_madd.
  • Figure 2: MADD post-processing principle. Example of two distributions of predicted probabilities (as in Fig. \ref{['fig:histoG0']} and \ref{['fig:histoG1']}), before and after the MADD post-processing.
  • Figure 3: MADD post-processing approach. (a) Illustration of the different distributions. (b) Illustration of a PDF based on an histogram. (c) Linear relationship between the PDFs (continuous space).
  • Figure 4: MADD post-processing.
  • Figure 5: Effect of the MADD post-processing on the predicted probabilities with increasing values of $\lambda$.
  • ...and 2 more figures

Theorems & Definitions (2)

  • theorem thmcountertheorem
  • theorem thmcountertheorem