From Models to Systems: A Comprehensive Fairness Framework for Compositional Recommender Systems
Brian Hsu, Cyrus DiCiccio, Natesh Sivasubramoniapillai, Hongseok Namkoong
TL;DR
This paper argues that fairness in industrial recommender systems requires a system-level perspective that spans retrieval, scoring, and serving, rather than focusing on single-model fairness. It formalizes a compositional utility framework where end-user utility is driven by group-specific preferences and calibrated model outputs, and demonstrates that disparities can persist due to upstream–downstream interactions and heterogeneous user preferences. To mitigate these disparities, it introduces a Bayes-opt optimization approach that jointly optimizes overall utility and the Deviation from Equal Representation (DER) metric via Expected Hyper-Volume Improvement (EHVI), integrating multi-label fairness with downstream business objectives. Empirical results on synthetic and real datasets show that the proposed Fair EHVI method yields better Pareto frontiers for utility and fairness than baselines, underscoring the value of system-level fairness tools for deployment contexts and regulatory regimes. The work highlights practical implications for timescale considerations and governance when pursuing equity across diverse user populations in large-scale recommender pipelines.
Abstract
Fairness research in machine learning often centers on ensuring equitable performance of individual models. However, real-world recommendation systems are built on multiple models and even multiple stages, from candidate retrieval to scoring and serving, which raises challenges for responsible development and deployment. This system-level view, as highlighted by regulations like the EU AI Act, necessitates moving beyond auditing individual models as independent entities. We propose a holistic framework for modeling system-level fairness, focusing on the end-utility delivered to diverse user groups, and consider interactions between components such as retrieval and scoring models. We provide formal insights on the limitations of focusing solely on model-level fairness and highlight the need for alternative tools that account for heterogeneity in user preferences. To mitigate system-level disparities, we adapt closed-box optimization tools (e.g., BayesOpt) to jointly optimize utility and equity. We empirically demonstrate the effectiveness of our proposed framework on synthetic and real datasets, underscoring the need for a system-level framework.
