Table of Contents
Fetching ...

Retention Induced Biases in a Recommendation System with Heterogeneous Users

Shichao Ma

TL;DR

The goal is to create greater awareness and spark deeper discussions about how recommendation systems evolve in real‐world settings, focusing on the interplay between algorithmic changes, user retention dynamics and data biases that impact evaluation accuracy.

Abstract

I examine a conceptual model of a recommendation system (RS) with user inflow and churn dynamics. When inflow and churn balance out, the user distribution reaches a steady state. Changing the recommendation algorithm alters the steady state and creates a transition period. During this period, the RS behaves differently from its new steady state. In particular, A/B experiment metrics obtained in transition periods are biased indicators of the RS's long-term performance. Scholars and practitioners, however, often conduct A/B tests shortly after introducing new algorithms to validate their effectiveness. This A/B experiment paradigm, widely regarded as the gold standard for assessing RS improvements, may consequently yield false conclusions. I also briefly touch on the data bias caused by the user retention dynamics.

Retention Induced Biases in a Recommendation System with Heterogeneous Users

TL;DR

The goal is to create greater awareness and spark deeper discussions about how recommendation systems evolve in real‐world settings, focusing on the interplay between algorithmic changes, user retention dynamics and data biases that impact evaluation accuracy.

Abstract

I examine a conceptual model of a recommendation system (RS) with user inflow and churn dynamics. When inflow and churn balance out, the user distribution reaches a steady state. Changing the recommendation algorithm alters the steady state and creates a transition period. During this period, the RS behaves differently from its new steady state. In particular, A/B experiment metrics obtained in transition periods are biased indicators of the RS's long-term performance. Scholars and practitioners, however, often conduct A/B tests shortly after introducing new algorithms to validate their effectiveness. This A/B experiment paradigm, widely regarded as the gold standard for assessing RS improvements, may consequently yield false conclusions. I also briefly touch on the data bias caused by the user retention dynamics.
Paper Structure (3 sections)

This paper contains 3 sections.