Table of Contents
Fetching ...

Capturing User Interests from Data Streams for Continual Sequential Recommendation

Gyuseok Lee, Hyunsik Yoo, Junyoung Hwang, SeongKu Kang, Hwanjo Yu

TL;DR

This work tackles the problem of continual sequential recommendation with Transformer-based SR models that must adapt to new user behavior while preserving long-term preferences. It introduces CSTRec, a continual, linear-attention-based framework featuring Continual Sequential Attention (CSA) with Cauchy-Schwarz Normalization (CSN) and Collaborative Interest Enrichment (CIE), plus Pseudo-Historical Knowledge Assignment to support new users. Empirical results on Gowalla, ML-1M, and Yelp show CSTRec achieves superior retention and acquisition (RA/LA) and higher H-mean compared to state-of-the-art continual baselines, while maintaining efficiency through linear-time attention and periodic enrichment. The method demonstrates strong potential for real-world, privacy-conscious, non-stationary SR scenarios by effectively tracking user-interest trajectories across data streams.

Abstract

Transformer-based sequential recommendation (SR) models excel at modeling long-range dependencies in user behavior via self-attention. However, updating them with continuously arriving behavior sequences incurs high computational costs or leads to catastrophic forgetting. Although continual learning, a standard approach for non-stationary data streams, has recently been applied to recommendation, existing methods gradually forget long-term user preferences and remain underexplored in SR. In this paper, we introduce Continual Sequential Transformer for Recommendation (CSTRec). CSTRec is designed to effectively adapt to current interests by leveraging well-preserved historical ones, thus capturing the trajectory of user interests over time. The core of CSTRec is Continual Sequential Attention (CSA), a linear attention tailored for continual SR, which enables CSTRec to partially retain historical knowledge without direct access to prior data. CSA has two key components: (1) Cauchy-Schwarz Normalization that stabilizes learning over time under uneven user interaction frequencies; (2) Collaborative Interest Enrichment that alleviates forgetting through shared, learnable interest pools. In addition, we introduce a new technique to facilitate the adaptation of new users by transferring historical knowledge from existing users with similar interests. Extensive experiments on three real-world datasets show that CSTRec outperforms state-of-the-art models in both knowledge retention and acquisition.

Capturing User Interests from Data Streams for Continual Sequential Recommendation

TL;DR

This work tackles the problem of continual sequential recommendation with Transformer-based SR models that must adapt to new user behavior while preserving long-term preferences. It introduces CSTRec, a continual, linear-attention-based framework featuring Continual Sequential Attention (CSA) with Cauchy-Schwarz Normalization (CSN) and Collaborative Interest Enrichment (CIE), plus Pseudo-Historical Knowledge Assignment to support new users. Empirical results on Gowalla, ML-1M, and Yelp show CSTRec achieves superior retention and acquisition (RA/LA) and higher H-mean compared to state-of-the-art continual baselines, while maintaining efficiency through linear-time attention and periodic enrichment. The method demonstrates strong potential for real-world, privacy-conscious, non-stationary SR scenarios by effectively tracking user-interest trajectories across data streams.

Abstract

Transformer-based sequential recommendation (SR) models excel at modeling long-range dependencies in user behavior via self-attention. However, updating them with continuously arriving behavior sequences incurs high computational costs or leads to catastrophic forgetting. Although continual learning, a standard approach for non-stationary data streams, has recently been applied to recommendation, existing methods gradually forget long-term user preferences and remain underexplored in SR. In this paper, we introduce Continual Sequential Transformer for Recommendation (CSTRec). CSTRec is designed to effectively adapt to current interests by leveraging well-preserved historical ones, thus capturing the trajectory of user interests over time. The core of CSTRec is Continual Sequential Attention (CSA), a linear attention tailored for continual SR, which enables CSTRec to partially retain historical knowledge without direct access to prior data. CSA has two key components: (1) Cauchy-Schwarz Normalization that stabilizes learning over time under uneven user interaction frequencies; (2) Collaborative Interest Enrichment that alleviates forgetting through shared, learnable interest pools. In addition, we introduce a new technique to facilitate the adaptation of new users by transferring historical knowledge from existing users with similar interests. Extensive experiments on three real-world datasets show that CSTRec outperforms state-of-the-art models in both knowledge retention and acquisition.

Paper Structure

This paper contains 29 sections, 16 equations, 5 figures, 6 tables, 1 algorithm.

Figures (5)

  • Figure 1: Overview of CSTRec, illustrating the computation of attention output $\mathbf{a}^t_i$ for the $i$-th item in the incoming sequence $S^t_u$. Following the Multi-head CSA, we apply the same transformer sublayer architecture (\ref{['subsub:transformer']})—dropout, layer normalization, and the position-wise feed-forward network. AGG denotes the aggregation of historical and current knowledge in Eq. (\ref{['eq:provide_enrich']}).
  • Figure 2: Effects of CSN on Yelp. (Left) Magnitude distribution of linear attention output. (Right) Validation performance. For 'w/o CSN', we apply layer normalization to the attention outputs to promote stable training.
  • Figure 3: Hit@20 results on Gowalla across three user groups. (Blue: IMSR, Orange: SAIL-PIW, Green: CSTRec)
  • Figure 4: Impact of the number and length of interests.
  • Figure 5: t-SNE visualization of interest pools.