Table of Contents
Fetching ...

A Probabilistic Framework for Temporal Distribution Generalization in Industry-Scale Recommender Systems

Yuxuan Zhu, Cong Fu, Yabo Ni, Anxiang Zeng, Yuan Fang

TL;DR

This work addresses temporal distribution shift (TDS) in industrial recommender systems by introducing ELBO_TDS, a probabilistic framework that couples a causal generative model with a lightweight, time‑varying data augmentation strategy. The approach disentangles stable latent factors from time‑varying signals, using a four‑term ELBO (reconstruction, entropy, prior, and predictive) that integrates self‑supervised and supervised components. Empirically, ELBO_TDS outperforms invariant learning and SSL baselines across large industry datasets, with ablations showing the most impact from statistical feature augmentations and demonstrating robustness to drastic distribution changes. The method is designed to be plug‑and‑play in incremental pipelines and achieves tangible online gains (e.g., GMV per user uplift) while maintaining scalability, motivating release of a large industrial TDS benchmark and deployment in Shopee Product Search.

Abstract

Temporal distribution shift (TDS) erodes the long-term accuracy of recommender systems, yet industrial practice still relies on periodic incremental training, which struggles to capture both stable and transient patterns. Existing approaches such as invariant learning and self-supervised learning offer partial solutions but often suffer from unstable temporal generalization, representation collapse, or inefficient data utilization. To address these limitations, we propose ELBO$_\text{TDS}$, a probabilistic framework that integrates seamlessly into industry-scale incremental learning pipelines. First, we identify key shifting factors through statistical analysis of real-world production data and design a simple yet effective data augmentation strategy that resamples these time-varying factors to extend the training support. Second, to harness the benefits of this extended distribution while preventing representation collapse, we model the temporal recommendation scenario using a causal graph and derive a self-supervised variational objective, ELBO$_\text{TDS}$, grounded in the causal structure. Extensive experiments supported by both theoretical and empirical analysis demonstrate that our method achieves superior temporal generalization, yielding a 2.33\% uplift in GMV per user and has been successfully deployed in Shopee Product Search. Code is available at https://github.com/FuCongResearchSquad/ELBO4TDS.

A Probabilistic Framework for Temporal Distribution Generalization in Industry-Scale Recommender Systems

TL;DR

This work addresses temporal distribution shift (TDS) in industrial recommender systems by introducing ELBO_TDS, a probabilistic framework that couples a causal generative model with a lightweight, time‑varying data augmentation strategy. The approach disentangles stable latent factors from time‑varying signals, using a four‑term ELBO (reconstruction, entropy, prior, and predictive) that integrates self‑supervised and supervised components. Empirically, ELBO_TDS outperforms invariant learning and SSL baselines across large industry datasets, with ablations showing the most impact from statistical feature augmentations and demonstrating robustness to drastic distribution changes. The method is designed to be plug‑and‑play in incremental pipelines and achieves tangible online gains (e.g., GMV per user uplift) while maintaining scalability, motivating release of a large industrial TDS benchmark and deployment in Shopee Product Search.

Abstract

Temporal distribution shift (TDS) erodes the long-term accuracy of recommender systems, yet industrial practice still relies on periodic incremental training, which struggles to capture both stable and transient patterns. Existing approaches such as invariant learning and self-supervised learning offer partial solutions but often suffer from unstable temporal generalization, representation collapse, or inefficient data utilization. To address these limitations, we propose ELBO, a probabilistic framework that integrates seamlessly into industry-scale incremental learning pipelines. First, we identify key shifting factors through statistical analysis of real-world production data and design a simple yet effective data augmentation strategy that resamples these time-varying factors to extend the training support. Second, to harness the benefits of this extended distribution while preventing representation collapse, we model the temporal recommendation scenario using a causal graph and derive a self-supervised variational objective, ELBO, grounded in the causal structure. Extensive experiments supported by both theoretical and empirical analysis demonstrate that our method achieves superior temporal generalization, yielding a 2.33\% uplift in GMV per user and has been successfully deployed in Shopee Product Search. Code is available at https://github.com/FuCongResearchSquad/ELBO4TDS.

Paper Structure

This paper contains 50 sections, 1 theorem, 10 equations, 8 figures, 9 tables, 1 algorithm.

Key Result

proposition 1

Let $\mathbf{x}_u$ and $\mathbf{x}_i$ be user and item features with their respective augmented views $\mathbf{X}_u=\{\mathbf{x}_u,\mathbf{x}^{+,0}_u,\mathbf{x}^{-,1}_u,..,\mathbf{x}^{-,k}_u\}$ and $\mathbf{X}_i=\{\mathbf{x}_i,\mathbf{x}^{+,0}_i,\mathbf{x}^{-,1}_i,..,\mathbf{x}^{-,k}_i\}$. the InfoN where $c=2\log k$ and k is the number of negative samples.

Figures (8)

  • Figure 1: Causal graph representing the data generation process under the temporal distribution shift perspective. Subscript $u$ and $i$ indicate user-specific and item-specific variables, respectively. $\mathbf{v}$ denotes time-varying factors, $\mathbf{s}$ denotes relatively stable factors, $\mathbf{z}$ denotes latent variables (representations), $\mathbf{x}$ denotes observed samples, and $\mathrm{y}$ denotes labels. Black arrows indicate directions of causal dependencies.
  • Figure 2: The statistical analysis of the distribution shift for statistical (a), sequential (b), and categorical (c) features, showing significant fluctuations and contributing to the sample TDS. Right side (d) illustrates the architecture of ELBO$_\text{TDS}$.
  • Figure 3: Scaling-over-time capability on Shopee-Small. Each day's checkpoints are tested on untrained next day's data.
  • Figure 4: Parameter sensitivity of $p$ and $\alpha$ on Kuairand-1K.
  • Figure 5: Parameter sensitivity of $p$ and $\alpha$ on Shopee-Small.
  • ...and 3 more figures

Theorems & Definitions (1)

  • proposition 1