Table of Contents
Fetching ...

A Multi-Objective Evaluation Framework for Analyzing Utility-Fairness Trade-Offs in Machine Learning Systems

Gökhan Özbulak, Oscar Jimenez-del-Toro, Maíra Fatoretto, Lilian Berton, André Anjos

TL;DR

This work tackles the challenge of evaluating machine learning systems under multiple utility and fairness objectives by introducing a model-agnostic, multi-objective evaluation framework grounded in Pareto Front analysis. It combines four performance indicators—diversity, convergence-diversity (hypervolume), and capacity (non-dominated solution counts)—with a radar-chart visualization to provide a compact, quantitative and qualitative comparison across black-box and white-box deployment scenarios. The framework is demonstrated through synthetic simulations and an empirical study on the Harvard Glaucoma Fairness dataset using Pareto HyperNetworks to generate trade-off sub-models, highlighting practical decision-making for tuning fairness with respect to utility. The approach offers a structured, transparent means to analyze and select ML strategies under complex fairness requirements, while acknowledging computational costs and potential need for adjustable indicator weighting.

Abstract

The evaluation of fairness models in Machine Learning involves complex challenges, such as defining appropriate metrics, balancing trade-offs between utility and fairness, and there are still gaps in this stage. This work presents a novel multi-objective evaluation framework that enables the analysis of utility-fairness trade-offs in Machine Learning systems. The framework was developed using criteria from Multi-Objective Optimization that collect comprehensive information regarding this complex evaluation task. The assessment of multiple Machine Learning systems is summarized, both quantitatively and qualitatively, in a straightforward manner through a radar chart and a measurement table encompassing various aspects such as convergence, system capacity, and diversity. The framework's compact representation of performance facilitates the comparative analysis of different Machine Learning strategies for decision-makers, in real-world applications, with single or multiple fairness requirements. The framework is model-agnostic and flexible to be adapted to any kind of Machine Learning systems, that is, black- or white-box, any kind and quantity of evaluation metrics, including multidimensional fairness criteria. The functionality and effectiveness of the proposed framework is shown with different simulations, and an empirical study conducted on a real-world dataset with various Machine Learning systems.

A Multi-Objective Evaluation Framework for Analyzing Utility-Fairness Trade-Offs in Machine Learning Systems

TL;DR

This work tackles the challenge of evaluating machine learning systems under multiple utility and fairness objectives by introducing a model-agnostic, multi-objective evaluation framework grounded in Pareto Front analysis. It combines four performance indicators—diversity, convergence-diversity (hypervolume), and capacity (non-dominated solution counts)—with a radar-chart visualization to provide a compact, quantitative and qualitative comparison across black-box and white-box deployment scenarios. The framework is demonstrated through synthetic simulations and an empirical study on the Harvard Glaucoma Fairness dataset using Pareto HyperNetworks to generate trade-off sub-models, highlighting practical decision-making for tuning fairness with respect to utility. The approach offers a structured, transparent means to analyze and select ML strategies under complex fairness requirements, while acknowledging computational costs and potential need for adjustable indicator weighting.

Abstract

The evaluation of fairness models in Machine Learning involves complex challenges, such as defining appropriate metrics, balancing trade-offs between utility and fairness, and there are still gaps in this stage. This work presents a novel multi-objective evaluation framework that enables the analysis of utility-fairness trade-offs in Machine Learning systems. The framework was developed using criteria from Multi-Objective Optimization that collect comprehensive information regarding this complex evaluation task. The assessment of multiple Machine Learning systems is summarized, both quantitatively and qualitatively, in a straightforward manner through a radar chart and a measurement table encompassing various aspects such as convergence, system capacity, and diversity. The framework's compact representation of performance facilitates the comparative analysis of different Machine Learning strategies for decision-makers, in real-world applications, with single or multiple fairness requirements. The framework is model-agnostic and flexible to be adapted to any kind of Machine Learning systems, that is, black- or white-box, any kind and quantity of evaluation metrics, including multidimensional fairness criteria. The functionality and effectiveness of the proposed framework is shown with different simulations, and an empirical study conducted on a real-world dataset with various Machine Learning systems.

Paper Structure

This paper contains 18 sections, 10 equations, 13 figures.

Figures (13)

  • Figure 1: Scenario 1: System evaluation as a black-box test.
  • Figure 2: Scenario 2: System evaluation as a white-box test.
  • Figure 3: Dominance in bi-objective \ref{['fig:obj_min']} minimization and \ref{['fig:obj_max']} maximization problems: $x'$ is dominated by $x$ with respect to the reference point $r$.
  • Figure 4: The approximate PF $S$ is shown for both bi-objective minimization (a) and maximization (b) tasks: $x'_i$ is dominated by $x_j$ with respect to the reference point $r$.
  • Figure 5: Diversity: (a) $System 1$ (blue) provides solutions that are more uniformly distributed than $System 2$ (red) and therefore has lower $UD$. (b) $System 1$ (blue) better covers the extremes of the PF approximations and therefore has better spread (larger $OS$) than $System 2$ (red).
  • ...and 8 more figures