Pre-trained Recommender Systems: A Causal Debiasing Perspective

Ziqian Lin; Hao Ding; Nghia Trong Hoang; Branislav Kveton; Anoop Deoras; Hao Wang

Pre-trained Recommender Systems: A Causal Debiasing Perspective

Ziqian Lin, Hao Ding, Nghia Trong Hoang, Branislav Kveton, Anoop Deoras, Hao Wang

TL;DR

The paper tackles data scarcity and transferability in recommender systems by introducing PreRec, a pre-trained, causally debiased recommender built as a hierarchical Bayesian deep model. It explicitly models in-domain (popularity) and cross-domain (domain) confounders with latent variables ${f Z}_j$ and ${f D}_k$, respectively, and leverages universal item/user embeddings derived from textual content and user histories. PreRec supports multi-domain pre-training, zero-shot inference via $do$-calculus to remove cross-domain bias, and target-domain fine-tuning to capture domain-specific signals; across cross-market and cross-platform tests on real data, it consistently outperforms strong baselines in zero-shot and fine-tuning scenarios. This causal debiasing framework enables more reliable, data-efficient transfer of recommender capabilities to new markets or platforms.

Abstract

Recent studies on pre-trained vision/language models have demonstrated the practical benefit of a new, promising solution-building paradigm in AI where models can be pre-trained on broad data describing a generic task space and then adapted successfully to solve a wide range of downstream tasks, even when training data is severely limited (e.g., in zero- or few-shot learning scenarios). Inspired by such progress, we investigate in this paper the possibilities and challenges of adapting such a paradigm to the context of recommender systems, which is less investigated from the perspective of pre-trained model. In particular, we propose to develop a generic recommender that captures universal interaction patterns by training on generic user-item interaction data extracted from different domains, which can then be fast adapted to improve few-shot learning performance in unseen new domains (with limited data). However, unlike vision/language data which share strong conformity in the semantic space, universal patterns underlying recommendation data collected across different domains (e.g., different countries or different E-commerce platforms) are often occluded by both in-domain and cross-domain biases implicitly imposed by the cultural differences in their user and item bases, as well as their uses of different e-commerce platforms. As shown in our experiments, such heterogeneous biases in the data tend to hinder the effectiveness of the pre-trained model. To address this challenge, we further introduce and formalize a causal debiasing perspective, which is substantiated via a hierarchical Bayesian deep learning model, named PreRec. Our empirical studies on real-world data show that the proposed model could significantly improve the recommendation performance in zero- and few-shot learning settings under both cross-market and cross-platform scenarios.

Pre-trained Recommender Systems: A Causal Debiasing Perspective

TL;DR

and

, respectively, and leverages universal item/user embeddings derived from textual content and user histories. PreRec supports multi-domain pre-training, zero-shot inference via

-calculus to remove cross-domain bias, and target-domain fine-tuning to capture domain-specific signals; across cross-market and cross-platform tests on real data, it consistently outperforms strong baselines in zero-shot and fine-tuning scenarios. This causal debiasing framework enables more reliable, data-efficient transfer of recommender capabilities to new markets or platforms.

Abstract

Paper Structure (20 sections, 17 equations, 5 figures, 4 tables)

This paper contains 20 sections, 17 equations, 5 figures, 4 tables.

Introduction
Preliminary
Pre-trained Recommender Systems
Model Overview
Multi-domain Pre-training
Zero-shot Recommendation
Fine-tuning
Experiments
Dataset Processing and Statistics
Evaluated Methods
Evaluation Metrics
Zero-shot Experiments
Fine-tuning Experiments
Related Work
Conclusion
...and 5 more sections

Figures (5)

Figure 1: The probabilistic graphical model (PGM) for our PreRec. ${\bf U}_i$ and ${\bf H}_i$ represent user $i$ and corresponding user history; ${\bf V}_j$, ${\bf X}_j$, and ${\bf Z}_j$ represent item $j$, its textual description (e.g., movie synopsis), and its popularity effect; ${\bf F}_j$ represents all the prominent factors impact item popularity; ${\bf D}_k$ represents domain $k$. $K_s$ represents all the source domains, while $J_k$, $I_k$ represent all the items and all the users in domain $k$, respectively. $\lambda_u$, $\lambda_v$, $\lambda_d$, $\lambda_z$ are hyperparameters related to distribution variance.
Figure 2: For zero-shot recommendation in the target domain, we perform causal intervention on ${\bf U}_i$, ${\bf V}_j$, and ${\bf Z}_j$, i.e., $do({\bf U}_i=f_\textit{seq}({\bf D}_k = \mathbf{0}, {\bf H}_i))$, $do({\bf V}_j=f_\textit{e}({\bf D}_k = \mathbf{0}, {\bf Z}_j=f_\textit{pop}({\bf F}_j), f_\textit{BERT}({\bf X}_j)))$, and $do({\bf Z}_j=f_\textit{pop}({\bf F}_j))$, to remove the cross-domain bias while injecting the in-domain bias in the target domain.
Figure 3: Incremental fine-tuning results of different models. K%$=0.04\%$ for Cross-Market and K%$=0.5\%$ for Cross-Platform. The solid line indicates the model is pre-trained while the dashed line indicates the model is trained from scratch. The last point for each line corresponds to the full fine-tuning with all available training data in the target domain.
Figure 4: The neural network implementation for our PreRec. Analogous to Fig. 2 in the main paper, $f_{\text{BERT}}$ is implemented as a multilingual language model, $f_{\text{seq}}$ is implemented as a transformer decoder, and $f_{\text{pop}}$ is implemented as a linear layer with activation.
Figure 5: The illustration of how popularity score works. The first column shows different domains have different traffic volumes and the second column shows occupation is not comparable among different domains since different domains have different numbers of available items. The third and fourth columns show the distribution of the first and the second dimension of the popularity factor ${\bf F}$ in each domain, which is a normalized frequency proposed in this paper, and shown to be more comparable among different domains. After pertaining, the popularity factor ${\bf F}$ is further mapped to the popularity score. As shown in the last column, the popularity score is comparable among zero-shot domains, indicating the generalizability of our proposed method for popularity.

Pre-trained Recommender Systems: A Causal Debiasing Perspective

TL;DR

Abstract

Pre-trained Recommender Systems: A Causal Debiasing Perspective

Authors

TL;DR

Abstract

Table of Contents

Figures (5)