Table of Contents
Fetching ...

A Novel Generative Model with Causality Constraint for Mitigating Biases in Recommender Systems

Jianfeng Deng, Qingfeng Chen, Debo Cheng, Jiuyong Li, Lin Liu, Shichao Zhang

TL;DR

This paper tackles latent confounding bias in recommender systems by introducing LCDR, a generative framework that leverages an identifiable VAE (iVAE) to learn causally informative latent representations from proxy signals. LCDR constrains a latent-causality aware VAE (LCVAE) to align its latent space $Z_{lc}$ with the iVAE-derived $Z$ via an $\ell_2$-based penalty $\lambda\|Z_{lc}-Z\|_2$ in the ELBO, enabling robust use of noisy proxies to recover latent confounders. The method combines a matrix-factorisation–backbone with a causal-constrained latent representation to improve both bias mitigation and recommendation accuracy, validated on Coat, Yahoo!R3, and KuaiRand where LCDR consistently outperforms state-of-the-art baselines and ablations. The work also provides identifiability results for the learned representations under realistic conditions and demonstrates practical robustness to proxy quality, suggesting significant impact for real-world recommender systems facing latent confounding and sparse, biased data.

Abstract

Accurately predicting counterfactual user feedback is essential for building effective recommender systems. However, latent confounding bias can obscure the true causal relationship between user feedback and item exposure, ultimately degrading recommendation performance. Existing causal debiasing approaches often rely on strong assumptions-such as the availability of instrumental variables (IVs) or strong correlations between latent confounders and proxy variables-that are rarely satisfied in real-world scenarios. To address these limitations, we propose a novel generative framework called Latent Causality Constraints for Debiasing representation learning in Recommender Systems (LCDR). Specifically, LCDR leverages an identifiable Variational Autoencoder (iVAE) as a causal constraint to align the latent representations learned by a standard Variational Autoencoder (VAE) through a unified loss function. This alignment allows the model to leverage even weak or noisy proxy variables to recover latent confounders effectively. The resulting representations are then used to improve recommendation performance. Extensive experiments on three real-world datasets demonstrate that LCDR consistently outperforms existing methods in both mitigating bias and improving recommendation accuracy.

A Novel Generative Model with Causality Constraint for Mitigating Biases in Recommender Systems

TL;DR

This paper tackles latent confounding bias in recommender systems by introducing LCDR, a generative framework that leverages an identifiable VAE (iVAE) to learn causally informative latent representations from proxy signals. LCDR constrains a latent-causality aware VAE (LCVAE) to align its latent space with the iVAE-derived via an -based penalty in the ELBO, enabling robust use of noisy proxies to recover latent confounders. The method combines a matrix-factorisation–backbone with a causal-constrained latent representation to improve both bias mitigation and recommendation accuracy, validated on Coat, Yahoo!R3, and KuaiRand where LCDR consistently outperforms state-of-the-art baselines and ablations. The work also provides identifiability results for the learned representations under realistic conditions and demonstrates practical robustness to proxy quality, suggesting significant impact for real-world recommender systems facing latent confounding and sparse, biased data.

Abstract

Accurately predicting counterfactual user feedback is essential for building effective recommender systems. However, latent confounding bias can obscure the true causal relationship between user feedback and item exposure, ultimately degrading recommendation performance. Existing causal debiasing approaches often rely on strong assumptions-such as the availability of instrumental variables (IVs) or strong correlations between latent confounders and proxy variables-that are rarely satisfied in real-world scenarios. To address these limitations, we propose a novel generative framework called Latent Causality Constraints for Debiasing representation learning in Recommender Systems (LCDR). Specifically, LCDR leverages an identifiable Variational Autoencoder (iVAE) as a causal constraint to align the latent representations learned by a standard Variational Autoencoder (VAE) through a unified loss function. This alignment allows the model to leverage even weak or noisy proxy variables to recover latent confounders effectively. The resulting representations are then used to improve recommendation performance. Extensive experiments on three real-world datasets demonstrate that LCDR consistently outperforms existing methods in both mitigating bias and improving recommendation accuracy.

Paper Structure

This paper contains 28 sections, 1 theorem, 24 equations, 6 figures, 5 tables, 1 algorithm.

Key Result

Theorem 1

Suppose we have data collected from a generative model as defined by Eqs. zm1-zm3, with parameters $(\mathbf{f}, \mathbf{H}, \mathbf{\lambda})$. If the following conditions hold: Then the parameters $(\mathbf{f}, \mathbf{H}, \mathbf{\lambda})$ are $\sim_M$ identifiable.

Figures (6)

  • Figure 1: An illustrative DAG showing how latent confounders affect recommendation systems. ${R}$ represents outcome variables, ${A}$ indicates exposure status, ${Z}$ denotes latent causal representations learned by iVAE, and ${Z_{lc}}$ represents Latent representations learned by latent causal-constrained VAE. ${W}$ represents the proxy variables, ${W}$ does not directly adjust the distribution of ${Z_{lc}}$ but indirectly adjusts the distribution of ${Z_{lc}}$ through ${Z}$.
  • Figure 2: The overview of our proposed LCDR method. First, LCDR employs the latent causal representations $Z$, inferred by the iVAE, to constrain the representations generated by the VAE, resulting in the constrained representations $Z_{lc}$. Next, it leverages $Z_{lc}$ to enhance the performance of the recommendation model. This design is motivated by the fact that VAE cannot recover the true latent causal structure without proxy variables, and that proxy variables are often of low quality in practice.
  • Figure 3: The performance of all methods on the three real-world datasets.
  • Figure : (a) Effect of the $\lambda$ selection. We show the results of NDCG@5 on the Coat datasets.
  • Figure : (a) Effect of the $\lambda$ selection. We show the results of NDCG@5 on the Coat datasets.
  • ...and 1 more figures

Theorems & Definitions (4)

  • Definition 1
  • Definition 2: Identifiability classes
  • Definition 3
  • Theorem 1