Table of Contents
Fetching ...

Simultaneous Unlearning of Multiple Protected User Attributes From Variational Autoencoder Recommenders Using Adversarial Training

Gustavo Escobedo, Christian Ganhör, Stefan Brandl, Mirjam Augstein, Markus Schedl

TL;DR

AdvXMultVAE is presented, which aims to unlearn multiple protected attributes (exemplified by gender and age) simultaneously to improve fairness across demographic user groups to support simultaneous removal of the users' protected attributes with continuous and/or categorical values.

Abstract

In widely used neural network-based collaborative filtering models, users' history logs are encoded into latent embeddings that represent the users' preferences. In this setting, the models are capable of mapping users' protected attributes (e.g., gender or ethnicity) from these user embeddings even without explicit access to them, resulting in models that may treat specific demographic user groups unfairly and raise privacy issues. While prior work has approached the removal of a single protected attribute of a user at a time, multiple attributes might come into play in real-world scenarios. In the work at hand, we present AdvXMultVAE which aims to unlearn multiple protected attributes (exemplified by gender and age) simultaneously to improve fairness across demographic user groups. For this purpose, we couple a variational autoencoder (VAE) architecture with adversarial training (AdvMultVAE) to support simultaneous removal of the users' protected attributes with continuous and/or categorical values. Our experiments on two datasets, LFM-2b-100k and Ml-1m, from the music and movie domains, respectively, show that our approach can yield better results than its singular removal counterparts (based on AdvMultVAE) in effectively mitigating demographic biases whilst improving the anonymity of latent embeddings.

Simultaneous Unlearning of Multiple Protected User Attributes From Variational Autoencoder Recommenders Using Adversarial Training

TL;DR

AdvXMultVAE is presented, which aims to unlearn multiple protected attributes (exemplified by gender and age) simultaneously to improve fairness across demographic user groups to support simultaneous removal of the users' protected attributes with continuous and/or categorical values.

Abstract

In widely used neural network-based collaborative filtering models, users' history logs are encoded into latent embeddings that represent the users' preferences. In this setting, the models are capable of mapping users' protected attributes (e.g., gender or ethnicity) from these user embeddings even without explicit access to them, resulting in models that may treat specific demographic user groups unfairly and raise privacy issues. While prior work has approached the removal of a single protected attribute of a user at a time, multiple attributes might come into play in real-world scenarios. In the work at hand, we present AdvXMultVAE which aims to unlearn multiple protected attributes (exemplified by gender and age) simultaneously to improve fairness across demographic user groups. For this purpose, we couple a variational autoencoder (VAE) architecture with adversarial training (AdvMultVAE) to support simultaneous removal of the users' protected attributes with continuous and/or categorical values. Our experiments on two datasets, LFM-2b-100k and Ml-1m, from the music and movie domains, respectively, show that our approach can yield better results than its singular removal counterparts (based on AdvMultVAE) in effectively mitigating demographic biases whilst improving the anonymity of latent embeddings.

Paper Structure

This paper contains 14 sections, 5 equations, 4 figures, 2 tables.

Figures (4)

  • Figure 1: Outline of our AdvXMultVAE approach. The thin gray arrows flowing from the top to the bottom indicate the forward pass, the bold arrows the backward pass. Here, the red and orange arrows highlight the reversed gradients of the attributes, $p_1$ and $p_2$, respectively, that the model should unlearn.
  • Figure 2: Performance for debiasing ($BAcc_{G}$ and $MAE_{A}$) and recommendation accuracy ($NDCG@10$) for the Ml-1m dataset. Points on the upper-left corner refer to the best privacy-preserving models where the AdvXMultVAE removal variants yield the strongest debiasing power with a marginal loss in terms of NDCG. The intersection of the dotted lines represent the model without debiasing (MultVAE).
  • Figure 3: t-SNE plots of the users' latent embeddings and attacker networks' predictions of gender for the Ml-1m dataset, each plot includes the density of the distribution for each gender across the corresponding latent dimension.
  • Figure 4: Interaction of different gradient scaling factors ($\lambda_{Gender}$, $\lambda_{Age}$) and reported metrics for the Ml-1m dataset. Each subplot indicates the obtained mean values of NDCG, $BAcc_{G}$, and $MAE_{A}$ respectively from left to right.