Table of Contents
Fetching ...

Recommendations as Treatments: Debiasing Learning and Evaluation

Tobias Schnabel, Adith Swaminathan, Ashudeep Singh, Navin Chandak, Thorsten Joachims

TL;DR

This work reframes recommender-system evaluation and learning under missing-not-at-random data as a causal problem of interventions. It develops propensity-based estimators (IPS and SNIPS) for unbiased performance metrics and integrates them into an ERM framework, yielding a scalable propensity-scored matrix factorization approach. The authors also present practical propensity-estimation methods and provide theoretical generalization bounds, demonstrating robustness to propensity misspecification. Extensive semi-synthetic and real-world experiments show that propensity-weighted learning and evaluation substantially improve predictive accuracy and bias resistance compared with traditional and joint-likelihood MNAR methods, underscoring the method's practicality and impact for deployed systems.

Abstract

Most data for evaluating and training recommender systems is subject to selection biases, either through self-selection by the users or through the actions of the recommendation system itself. In this paper, we provide a principled approach to handling selection biases, adapting models and estimation techniques from causal inference. The approach leads to unbiased performance estimators despite biased data, and to a matrix factorization method that provides substantially improved prediction performance on real-world data. We theoretically and empirically characterize the robustness of the approach, finding that it is highly practical and scalable.

Recommendations as Treatments: Debiasing Learning and Evaluation

TL;DR

This work reframes recommender-system evaluation and learning under missing-not-at-random data as a causal problem of interventions. It develops propensity-based estimators (IPS and SNIPS) for unbiased performance metrics and integrates them into an ERM framework, yielding a scalable propensity-scored matrix factorization approach. The authors also present practical propensity-estimation methods and provide theoretical generalization bounds, demonstrating robustness to propensity misspecification. Extensive semi-synthetic and real-world experiments show that propensity-weighted learning and evaluation substantially improve predictive accuracy and bias resistance compared with traditional and joint-likelihood MNAR methods, underscoring the method's practicality and impact for deployed systems.

Abstract

Most data for evaluating and training recommender systems is subject to selection biases, either through self-selection by the users or through the actions of the recommendation system itself. In this paper, we provide a principled approach to handling selection biases, adapting models and estimation techniques from causal inference. The approach leads to unbiased performance estimators despite biased data, and to a matrix factorization method that provides substantially improved prediction performance on real-world data. We theoretically and empirically characterize the robustness of the approach, finding that it is highly practical and scalable.

Paper Structure

This paper contains 29 sections, 4 theorems, 17 equations, 4 figures, 2 tables.

Key Result

Proposition 3.1

Let $P$ be the independent Bernoulli probabilities of observing each entry. For any given $\hat{Y}$ and $Y$, with probability $1-\eta$, the IPS estimator $\hat{R}_{IPS}(\hat{Y}|P)$ does not deviate from the true $R(\hat{Y})$ by more than: where $\rho_{u,i}=\frac{\delta_{u,i}(Y,\hat{Y})}{P_{u,i}}$ if $P_{u,i}<1$, and $\rho_{u,i}=0$ otherwise.

Figures (4)

  • Figure 1: Movie-Lovers toy example. Top row: true rating matrix $Y$, propensity matrix $P$, observation indicator matrix $O$. Bottom row: two rating prediction matrices $\hat{Y}_1$ and $\hat{Y}_2$, and intervention indicator matrix $\hat{Y}_3$.
  • Figure 2: RMSE of the estimators in the experimental setting as the observed ratings exhibit varying degrees of selection bias.
  • Figure 3: Prediction error (MSE) of matrix factorization methods as the observed ratings exhibit varying degrees of selection bias (left) and as propensity estimation quality degrades (right).
  • Figure 4: RMSE of IPS and SNIPS as propensity estimates degrade. IPS with true propensities and Naive are given as reference.

Theorems & Definitions (5)

  • Proposition 3.1: Tail Bound for IPS Estimator
  • Definition 4.1: Propensity-Scored ERM for Recommendation
  • Theorem 4.2: Propensity-Scored ERM Generalization Error Bound
  • Lemma 5.1: Bias of IPS Estimator under Inaccurate Propensities
  • Theorem 5.2: Propensity-Scored ERM Generalization Error Bound under Inaccurate Propensities