Table of Contents
Fetching ...

Revisiting Self-attention for Cross-domain Sequential Recommendation

Clark Mingxuan Ju, Leonardo Neves, Bhuvesh Kumar, Liam Collins, Tong Zhao, Yuwei Qiu, Qing Dou, Sohail Nizam, Sen Yang, Neil Shah

TL;DR

This work tackles cross-domain sequential recommendation (CDSR) by reframing cross-domain transfer as a Pareto-aware optimization of the self-attention mechanism, reducing reliance on heavy domain-specific blocks. The authors introduce AutoCDSR, which minimizes cross-domain attention unless such transfer improves the primary recommendation task, and AutoCDSR+, which constrains transfer through Pareto-optimal information bottleneck tokens. They treat the problem as a two-task multi-objective learning problem and deploy a preference-aware Pareto front via MGDA and Frank-Wolfe-inspired updates to balance objectives, yielding a plug-and-play module that attaches to standard transformers like SASRec and BERT4Rec. Experiments on multiple academic and industrial datasets show significant CDSR gains with minimal computational overhead, and attention analyses reveal that AutoCDSR effectively suppresses harmful cross-domain interactions while preserving beneficial ones. The proposed approach offers a practical pathway to deploy cross-domain modeling in production, bridging the gap between state-of-the-art CDSR methods and scalable, domain-agnostic recommender systems.

Abstract

Sequential recommendation is a popular paradigm in modern recommender systems. In particular, one challenging problem in this space is cross-domain sequential recommendation (CDSR), which aims to predict future behaviors given user interactions across multiple domains. Existing CDSR frameworks are mostly built on the self-attention transformer and seek to improve by explicitly injecting additional domain-specific components (e.g. domain-aware module blocks). While these additional components help, we argue they overlook the core self-attention module already present in the transformer, a naturally powerful tool to learn correlations among behaviors. In this work, we aim to improve the CDSR performance for simple models from a novel perspective of enhancing the self-attention. Specifically, we introduce a Pareto-optimal self-attention and formulate the cross-domain learning as a multi-objective problem, where we optimize the recommendation task while dynamically minimizing the cross-domain attention scores. Our approach automates knowledge transfer in CDSR (dubbed as AutoCDSR) -- it not only mitigates negative transfer but also encourages complementary knowledge exchange among auxiliary domains. Based on the idea, we further introduce AutoCDSR+, a more performant variant with slight additional cost. Our proposal is easy to implement and works as a plug-and-play module that can be incorporated into existing transformer-based recommenders. Besides flexibility, it is practical to deploy because it brings little extra computational overheads without heavy hyper-parameter tuning. AutoCDSR on average improves Recall@10 for SASRec and Bert4Rec by 9.8% and 16.0% and NDCG@10 by 12.0% and 16.7%, respectively. Code is available at https://github.com/snap-research/AutoCDSR.

Revisiting Self-attention for Cross-domain Sequential Recommendation

TL;DR

This work tackles cross-domain sequential recommendation (CDSR) by reframing cross-domain transfer as a Pareto-aware optimization of the self-attention mechanism, reducing reliance on heavy domain-specific blocks. The authors introduce AutoCDSR, which minimizes cross-domain attention unless such transfer improves the primary recommendation task, and AutoCDSR+, which constrains transfer through Pareto-optimal information bottleneck tokens. They treat the problem as a two-task multi-objective learning problem and deploy a preference-aware Pareto front via MGDA and Frank-Wolfe-inspired updates to balance objectives, yielding a plug-and-play module that attaches to standard transformers like SASRec and BERT4Rec. Experiments on multiple academic and industrial datasets show significant CDSR gains with minimal computational overhead, and attention analyses reveal that AutoCDSR effectively suppresses harmful cross-domain interactions while preserving beneficial ones. The proposed approach offers a practical pathway to deploy cross-domain modeling in production, bridging the gap between state-of-the-art CDSR methods and scalable, domain-agnostic recommender systems.

Abstract

Sequential recommendation is a popular paradigm in modern recommender systems. In particular, one challenging problem in this space is cross-domain sequential recommendation (CDSR), which aims to predict future behaviors given user interactions across multiple domains. Existing CDSR frameworks are mostly built on the self-attention transformer and seek to improve by explicitly injecting additional domain-specific components (e.g. domain-aware module blocks). While these additional components help, we argue they overlook the core self-attention module already present in the transformer, a naturally powerful tool to learn correlations among behaviors. In this work, we aim to improve the CDSR performance for simple models from a novel perspective of enhancing the self-attention. Specifically, we introduce a Pareto-optimal self-attention and formulate the cross-domain learning as a multi-objective problem, where we optimize the recommendation task while dynamically minimizing the cross-domain attention scores. Our approach automates knowledge transfer in CDSR (dubbed as AutoCDSR) -- it not only mitigates negative transfer but also encourages complementary knowledge exchange among auxiliary domains. Based on the idea, we further introduce AutoCDSR+, a more performant variant with slight additional cost. Our proposal is easy to implement and works as a plug-and-play module that can be incorporated into existing transformer-based recommenders. Besides flexibility, it is practical to deploy because it brings little extra computational overheads without heavy hyper-parameter tuning. AutoCDSR on average improves Recall@10 for SASRec and Bert4Rec by 9.8% and 16.0% and NDCG@10 by 12.0% and 16.7%, respectively. Code is available at https://github.com/snap-research/AutoCDSR.

Paper Structure

This paper contains 24 sections, 14 equations, 6 figures, 7 tables.

Figures (6)

  • Figure 1: Cross-domain sequences with different characteristics, where tiles indicate behavior semantics. There does not exist a static learning pattern that can well handle knowledge transfer in all scenarios due to asymmetric distribution of domains and inadvertent noises from additional domains.
  • Figure 2: Cross-domain and single-domain attention scores (averaged on all layers and heads) of BERT4Rec trained with cross-domain sequences on two datasets. Negative transfer happens when the model attends too much to unnecessary cross-domain information when single-domain knowledge is sufficient (i.e., the 3rd group).
  • Figure 3: The distribution of attention scores across different strata in KuaiRand-1K with the deployment of AutoCDSR. Cross-domain attention scores for samples suffering from negative transfer are reduced significantly by AutoCDSR.
  • Figure 4: The performance of base BERT4Rec model supervised by the additional attention loss with different weights. Base transformers are sensitive to the weight selection and there does not exist a single optimal value for all domains.
  • Figure 5: Task weight trajectory derived by AutoCDSR.
  • ...and 1 more figures

Theorems & Definitions (1)

  • definition 1: Pareto Optimality