Table of Contents
Fetching ...

When Machine Learning Gets Personal: Understanding Fairness of Personalized Models

Louisa Cornelis, Guillermo Bernárdez, Haewon Jeong, Nina Miolane

TL;DR

This paper tackles the question of when personalization in machine learning benefits both predictive accuracy and explanation quality. It introduces a unified Benefit of Personalization (BoP) framework that applies to classification and regression, defining population and group costs $C(h)$ and $C(h,s)$ and the BoP metric $\operatorname{BoP}(h_0,h_p)=C(h_0)-C(h_p)$, with a focus on both prediction (BoP-P) and explanations (BoP-X). The authors derive information-theoretic lower bounds on the reliability of BoP testing, including exponential-family specializations and a maximum attribute-count bound via Lambert W, and show that regression can tolerate more personal attributes than classification. Experiments on MIMIC-III demonstrate heterogeneous effects: personalization can improve accuracy and explanations in some subgroups while offering limited or mixed gains in others, underscoring the need to evaluate both prediction and explainability in healthcare settings. Overall, the framework provides practical tools to balance accuracy, fairness, and interpretability in personalized models, guiding safer deployment of personalized ML in high-stakes contexts.

Abstract

Personalization in machine learning involves tailoring models to individual users by incorporating personal attributes such as demographic or medical data. While personalization can improve prediction accuracy, it may also amplify biases and reduce explainability. This work introduces a unified framework to evaluate the impact of personalization on both prediction accuracy and explanation quality across classification and regression tasks. We derive novel upper bounds for the number of personal attributes that can be used to reliably validate benefits of personalization. Our analysis uncovers key trade-offs. We show that regression models can potentially utilize more personal attributes than classification models. We also demonstrate that improvements in prediction accuracy due to personalization do not necessarily translate to enhanced explainability -- underpinning the importance to evaluate both metrics when personalizing machine learning models in critical settings such as healthcare. Validated with a real-world dataset, this framework offers practical guidance for balancing accuracy, fairness, and interpretability in personalized models.

When Machine Learning Gets Personal: Understanding Fairness of Personalized Models

TL;DR

This paper tackles the question of when personalization in machine learning benefits both predictive accuracy and explanation quality. It introduces a unified Benefit of Personalization (BoP) framework that applies to classification and regression, defining population and group costs and and the BoP metric , with a focus on both prediction (BoP-P) and explanations (BoP-X). The authors derive information-theoretic lower bounds on the reliability of BoP testing, including exponential-family specializations and a maximum attribute-count bound via Lambert W, and show that regression can tolerate more personal attributes than classification. Experiments on MIMIC-III demonstrate heterogeneous effects: personalization can improve accuracy and explanations in some subgroups while offering limited or mixed gains in others, underscoring the need to evaluate both prediction and explainability in healthcare settings. Overall, the framework provides practical tools to balance accuracy, fairness, and interpretability in personalized models, guiding safer deployment of personalized ML in high-stakes contexts.

Abstract

Personalization in machine learning involves tailoring models to individual users by incorporating personal attributes such as demographic or medical data. While personalization can improve prediction accuracy, it may also amplify biases and reduce explainability. This work introduces a unified framework to evaluate the impact of personalization on both prediction accuracy and explanation quality across classification and regression tasks. We derive novel upper bounds for the number of personal attributes that can be used to reliably validate benefits of personalization. Our analysis uncovers key trade-offs. We show that regression models can potentially utilize more personal attributes than classification models. We also demonstrate that improvements in prediction accuracy due to personalization do not necessarily translate to enhanced explainability -- underpinning the importance to evaluate both metrics when personalizing machine learning models in critical settings such as healthcare. Validated with a real-world dataset, this framework offers practical guidance for balancing accuracy, fairness, and interpretability in personalized models.

Paper Structure

This paper contains 46 sections, 17 theorems, 105 equations, 9 figures, 2 tables.

Key Result

Theorem 4.1

There exists a data distribution $P_{\mathbf{X}, \mathbf{S}, Y}$ such that the Bayes optimal classifiers $h_0$ and $h_p$ satisfy $\text{BoP-P}(h_0, h_p) = 0$ and $\text{BoP-X}(h_0, h_p) > 0$

Figures (9)

  • Figure 1: Impact of personalization on fairness in prediction and explainability. (A) Comparison between a generic ML model ($h_0$), which uses only input features ($\mathbf{X}$), and a personalized model ($h_p$), which incorporates additional personal attributes ($S_1, \dots, S_k$). (B) Group-specific effects on prediction and explanation under personalization. While some groups benefit from improved prediction and explanation, others experience trade-offs, including worsened prediction accuracy or explainability.
  • Figure 2: Personalized models should not be dismissed just because they do not provide a clear BoP gain in terms of prediction accuracy: explainability could be improved (see Theorem \ref{['thm:Bop_to_BopX']}). This figure shows the differences between a generic ($h_0$) and a personalized ($h_p$) model in terms of prediction accuracy and explanation quality --the latter measured by sufficiency and comprehensiveness, defined in Table \ref{['tab:costs']}. The generic model $h_0$ uses both $X_1$ and $X_2$ for predictions based on the decision boundary $X_1 + X_2 > 0$, while $h_p$, with access to the group attribute $S = X_1 + X_2$, relies entirely on $S > 0$ for predictions (middle column). In the sufficiency evaluation (left column), where only the most important feature is kept, $h_p$ achieves perfect prediction since it relies solely on $S$, reaching maximum sufficiency. In contrast, $h_0$, using $X_1$, has a lower sufficiency score. This demonstrates that personalization enhances explainability, even though prediction accuracy remains the same. In the comprehensiveness evaluation (right column), where the most important feature is removed, $h_p$ defaults to random guessing when only $X = (X_1, X_2)$ is available, as it never learned to use $X_1$ or $X_2$. This results in $h_p$ achieving the minimum comprehensiveness value, indicating the best explanation performance. Conversely, $h_0$ shows a higher comprehensiveness score. Again, personalization improves explainability according to this measure, without affecting prediction accuracy.
  • Figure 3: Lower bound of the probability of error $P_e$ versus number of attributes $k$, defining the number of groups $d = 2^k$ for $\epsilon=0.01$. For three different number of samples $N$, we consider a categorical BoP (orange), Gaussian BoPs with different variance $\sigma^2$ (blue), and Laplace BoPs with several scale parameters $b = \frac{\sigma}{\sqrt{2}}$. We see that for small $\sigma$ values in the Gaussian case, the number of attributes $k$ that can be used before surpassing $P_e\geq 1/2$ is higher than for the categorical case. The Laplace case surpasses the categorical in all cases, and the Gaussian in most. For this example, we utilize the $P_e$ functions assuming each group has $m = \lfloor \frac{N}{d} \rfloor$ samples.
  • Figure 4: Leveraging the validation framework, we plot how the $P_e$ changes for different $\epsilon$ values for a set $N$ and $k = 2$ using Corollary \ref{['cor:ef']} and \ref{['cor:laplace']}. We utilize the Laplace and Gaussian form in A, and the Categorical in B.
  • Figure 5: For a linear model, absence of benefit in explanation quality means that there is also an absence of benefit in prediction accuracy, as illustrated here (see Theorem \ref{['thm:BopX_to_Bop']}). We consider a linear model $Y = X + S + \epsilon$, with $h_0$ and $h_p$ Bayes optimal regressors. In this example, absence of benefit of personalization for the explanation quality, $\text{BoP-X}^{\text{suff}}=0$ evaluated in terms of sufficiency (left column) means: $\Delta \text{MSE}_0 = \Delta \text{MSE}_p \Rightarrow \text{var}(X) =0$. Then, absence of benefit of personalization for the explanation quality, $\text{BoP-X}^{\text{comp}}=0$ evaluated in terms of comprehensiveness (right column) means: $\Delta \text{MSE}_0 = \Delta \text{MSE}_p \Rightarrow \text{var}(S) = \text{var}(X) \Rightarrow \text{var}(S) =0$. This allows us to conclude that, in terms of prediction accuracy (middle column): $\text{MSE}_0 = \text{MSE}_p$ and hence there is also no benefit of personalization in prediction :$\text{BoP-P}=0$.
  • ...and 4 more figures

Theorems & Definitions (33)

  • Definition 3.1: Model cost
  • Definition 3.2: Benefit of Personalization (BoP)
  • Definition 3.3: Minimal Group BoP
  • Theorem 4.1
  • Theorem 4.2
  • Theorem 5.1: Lower bound for BoP
  • Corollary 5.2: Lower Bound Exponential Family Distributions
  • Corollary 5.3: Lower bound Laplace Distribution
  • Corollary 5.4: Maximum number of attributes
  • Definition 2.1: Population Cost
  • ...and 23 more