Table of Contents
Fetching ...

Data Generation via Latent Factor Simulation for Fairness-aware Re-ranking

Elena Stefancova, Cassidy All, Joshua Paup, Martin Homola, Nicholas Mattei, Robin Burke

TL;DR

The paper tackles the lack of datasets with protected features for evaluating fairness-aware recommender systems by introducing Latent Factor Simulation (LAFS), a fully synthetic data generator tailored for post-processing re-ranking. It constructs user and item latent-factor matrices $U \in \mathbb{R}^{n_u \times k}$ and $V \in \mathbb{R}^{n_i \times k}$ from per-factor propensities, with the first $k_s$ factors encoding protected features, and computes ratings via $r_{ui} = U_u^\top V_i$. A bias penalty from $B$ is applied to items with protected features before selecting the top-$l$ recommendations, with optional min-max normalization, and the framework supports dynamic user regimes to model fairness under population shifts. The authors situate LAFS within synthetic-data and post-processing fairness literature, discuss current limitations of synthetic evaluation, and outline future improvements such as incorporating item popularity distributions and feature co-variance, while releasing LAFS as open-source under the MIT License. This approach enables controlled, repeatable fairness analyses in re-ranking research without relying on sensitive real-world data.

Abstract

Synthetic data is a useful resource for algorithmic research. It allows for the evaluation of systems under a range of conditions that might be difficult to achieve in real world settings. In recommender systems, the use of synthetic data is somewhat limited; some work has concentrated on building user-item interaction data at large scale. We believe that fairness-aware recommendation research can benefit from simulated data as it allows the study of protected groups and their interactions without depending on sensitive data that needs privacy protection. In this paper, we propose a novel type of data for fairness-aware recommendation: synthetic recommender system outputs that can be used to study re-ranking algorithms.

Data Generation via Latent Factor Simulation for Fairness-aware Re-ranking

TL;DR

The paper tackles the lack of datasets with protected features for evaluating fairness-aware recommender systems by introducing Latent Factor Simulation (LAFS), a fully synthetic data generator tailored for post-processing re-ranking. It constructs user and item latent-factor matrices and from per-factor propensities, with the first factors encoding protected features, and computes ratings via . A bias penalty from is applied to items with protected features before selecting the top- recommendations, with optional min-max normalization, and the framework supports dynamic user regimes to model fairness under population shifts. The authors situate LAFS within synthetic-data and post-processing fairness literature, discuss current limitations of synthetic evaluation, and outline future improvements such as incorporating item popularity distributions and feature co-variance, while releasing LAFS as open-source under the MIT License. This approach enables controlled, repeatable fairness analyses in re-ranking research without relying on sensitive real-world data.

Abstract

Synthetic data is a useful resource for algorithmic research. It allows for the evaluation of systems under a range of conditions that might be difficult to achieve in real world settings. In recommender systems, the use of synthetic data is somewhat limited; some work has concentrated on building user-item interaction data at large scale. We believe that fairness-aware recommendation research can benefit from simulated data as it allows the study of protected groups and their interactions without depending on sensitive data that needs privacy protection. In this paper, we propose a novel type of data for fairness-aware recommendation: synthetic recommender system outputs that can be used to study re-ranking algorithms.
Paper Structure (7 sections, 1 figure, 1 table)

This paper contains 7 sections, 1 figure, 1 table.

Figures (1)

  • Figure 1: Overview of the LAFS data generation process