Recommendations as Treatments: Debiasing Learning and Evaluation
Tobias Schnabel, Adith Swaminathan, Ashudeep Singh, Navin Chandak, Thorsten Joachims
TL;DR
This work reframes recommender-system evaluation and learning under missing-not-at-random data as a causal problem of interventions. It develops propensity-based estimators (IPS and SNIPS) for unbiased performance metrics and integrates them into an ERM framework, yielding a scalable propensity-scored matrix factorization approach. The authors also present practical propensity-estimation methods and provide theoretical generalization bounds, demonstrating robustness to propensity misspecification. Extensive semi-synthetic and real-world experiments show that propensity-weighted learning and evaluation substantially improve predictive accuracy and bias resistance compared with traditional and joint-likelihood MNAR methods, underscoring the method's practicality and impact for deployed systems.
Abstract
Most data for evaluating and training recommender systems is subject to selection biases, either through self-selection by the users or through the actions of the recommendation system itself. In this paper, we provide a principled approach to handling selection biases, adapting models and estimation techniques from causal inference. The approach leads to unbiased performance estimators despite biased data, and to a matrix factorization method that provides substantially improved prediction performance on real-world data. We theoretically and empirically characterize the robustness of the approach, finding that it is highly practical and scalable.
