Table of Contents
Fetching ...

Debiased Model-based Interactive Recommendation

Zijian Li, Ruichu Cai, Haiqin Huang, Sili Zhang, Yuguang Yan, Zhifeng Hao, Zhenghua Dong

TL;DR

This paper tackles bias in model-based interactive recommendation by introducing $iDMIR$, which couples a debiased causal world model with a debiased contrastive policy to handle time-varying popularity and sampling bias. It provides identifiability guarantees for latent user and context variables and derives an unbiased intervention distribution, enabling reliable offline training. The framework is supported by an ELBO-based learning objective and a recursive latent-state encoder, culminating in a four-step training procedure. Empirical results on Ciao, Epinions, and Yelp show that $iDMIR$ outperforms state-of-the-art baselines in accuracy and diversity while serving as a plug-in to boost model-free methods.

Abstract

Existing model-based interactive recommendation systems are trained by querying a world model to capture the user preference, but learning the world model from historical logged data will easily suffer from bias issues such as popularity bias and sampling bias. This is why some debiased methods have been proposed recently. However, two essential drawbacks still remain: 1) ignoring the dynamics of the time-varying popularity results in a false reweighting of items. 2) taking the unknown samples as negative samples in negative sampling results in the sampling bias. To overcome these two drawbacks, we develop a model called \textbf{i}dentifiable \textbf{D}ebiased \textbf{M}odel-based \textbf{I}nteractive \textbf{R}ecommendation (\textbf{iDMIR} in short). In iDMIR, for the first drawback, we devise a debiased causal world model based on the causal mechanism of the time-varying recommendation generation process with identification guarantees; for the second drawback, we devise a debiased contrastive policy, which coincides with the debiased contrastive learning and avoids sampling bias. Moreover, we demonstrate that the proposed method not only outperforms several latest interactive recommendation algorithms but also enjoys diverse recommendation performance.

Debiased Model-based Interactive Recommendation

TL;DR

This paper tackles bias in model-based interactive recommendation by introducing , which couples a debiased causal world model with a debiased contrastive policy to handle time-varying popularity and sampling bias. It provides identifiability guarantees for latent user and context variables and derives an unbiased intervention distribution, enabling reliable offline training. The framework is supported by an ELBO-based learning objective and a recursive latent-state encoder, culminating in a four-step training procedure. Empirical results on Ciao, Epinions, and Yelp show that outperforms state-of-the-art baselines in accuracy and diversity while serving as a plug-in to boost model-free methods.

Abstract

Existing model-based interactive recommendation systems are trained by querying a world model to capture the user preference, but learning the world model from historical logged data will easily suffer from bias issues such as popularity bias and sampling bias. This is why some debiased methods have been proposed recently. However, two essential drawbacks still remain: 1) ignoring the dynamics of the time-varying popularity results in a false reweighting of items. 2) taking the unknown samples as negative samples in negative sampling results in the sampling bias. To overcome these two drawbacks, we develop a model called \textbf{i}dentifiable \textbf{D}ebiased \textbf{M}odel-based \textbf{I}nteractive \textbf{R}ecommendation (\textbf{iDMIR} in short). In iDMIR, for the first drawback, we devise a debiased causal world model based on the causal mechanism of the time-varying recommendation generation process with identification guarantees; for the second drawback, we devise a debiased contrastive policy, which coincides with the debiased contrastive learning and avoids sampling bias. Moreover, we demonstrate that the proposed method not only outperforms several latest interactive recommendation algorithms but also enjoys diverse recommendation performance.
Paper Structure (33 sections, 8 theorems, 36 equations, 6 figures, 4 tables, 1 algorithm)

This paper contains 33 sections, 8 theorems, 36 equations, 6 figures, 4 tables, 1 algorithm.

Key Result

Theorem 1

(Identification of Debiased Estimation in Interactive Recommendation) Suppose that the joint distribution $P(\bm{G}_t, a_t, z_t, y_t, \bm{G}_{t\!-\!1}, a_{t\!-\!1}, z_{t\!-\!1}, y_{t\!-\!1}, \bm{s}_{t\!-\!1}^u)$ is recovered, then the Equation (equ:do_cal) can be estimated under the causal model sho

Figures (6)

  • Figure 1: The illustration of two drawbacks that lead to biased estimation. (a) The popularity of items changes over time. (b) The estimated probability of different items is gradually degenerated by ignoring the dynamic popularity and sampling bias. (c) The negative sampling technique, which uses unknown samples as negative samples, will result in sampling bias. (Best view in color.)
  • Figure 2: An illustration of iDMIR. (b) illustrates how the debiased contrastive policy and the debiased causal world model are under the framework of model-based reinforcement learning. (a) Causal graph of debiased causal world model in the interactive recommendation. (c) The debiased contrastive policy respectively use the positive and negative sequence for unbiased policy.
  • Figure 3: An illustration of theoretical framework for debiased estimation identification.
  • Figure 4: Evaluation of reward curve results on different datasets.
  • Figure 5: Experiment results among DMIR-D and other model-free based methods
  • ...and 1 more figures

Theorems & Definitions (15)

  • Theorem 1
  • Theorem 2
  • Lemma 3
  • Proposition 4
  • Proof
  • Definition 1
  • Proposition 5
  • Proof
  • Theorem 1
  • Proof 1
  • ...and 5 more