A Multi-Objective Evaluation Framework for Analyzing Utility-Fairness Trade-Offs in Machine Learning Systems
Gökhan Özbulak, Oscar Jimenez-del-Toro, Maíra Fatoretto, Lilian Berton, André Anjos
TL;DR
This work tackles the challenge of evaluating machine learning systems under multiple utility and fairness objectives by introducing a model-agnostic, multi-objective evaluation framework grounded in Pareto Front analysis. It combines four performance indicators—diversity, convergence-diversity (hypervolume), and capacity (non-dominated solution counts)—with a radar-chart visualization to provide a compact, quantitative and qualitative comparison across black-box and white-box deployment scenarios. The framework is demonstrated through synthetic simulations and an empirical study on the Harvard Glaucoma Fairness dataset using Pareto HyperNetworks to generate trade-off sub-models, highlighting practical decision-making for tuning fairness with respect to utility. The approach offers a structured, transparent means to analyze and select ML strategies under complex fairness requirements, while acknowledging computational costs and potential need for adjustable indicator weighting.
Abstract
The evaluation of fairness models in Machine Learning involves complex challenges, such as defining appropriate metrics, balancing trade-offs between utility and fairness, and there are still gaps in this stage. This work presents a novel multi-objective evaluation framework that enables the analysis of utility-fairness trade-offs in Machine Learning systems. The framework was developed using criteria from Multi-Objective Optimization that collect comprehensive information regarding this complex evaluation task. The assessment of multiple Machine Learning systems is summarized, both quantitatively and qualitatively, in a straightforward manner through a radar chart and a measurement table encompassing various aspects such as convergence, system capacity, and diversity. The framework's compact representation of performance facilitates the comparative analysis of different Machine Learning strategies for decision-makers, in real-world applications, with single or multiple fairness requirements. The framework is model-agnostic and flexible to be adapted to any kind of Machine Learning systems, that is, black- or white-box, any kind and quantity of evaluation metrics, including multidimensional fairness criteria. The functionality and effectiveness of the proposed framework is shown with different simulations, and an empirical study conducted on a real-world dataset with various Machine Learning systems.
