Revisiting Self-attention for Cross-domain Sequential Recommendation
Clark Mingxuan Ju, Leonardo Neves, Bhuvesh Kumar, Liam Collins, Tong Zhao, Yuwei Qiu, Qing Dou, Sohail Nizam, Sen Yang, Neil Shah
TL;DR
This work tackles cross-domain sequential recommendation (CDSR) by reframing cross-domain transfer as a Pareto-aware optimization of the self-attention mechanism, reducing reliance on heavy domain-specific blocks. The authors introduce AutoCDSR, which minimizes cross-domain attention unless such transfer improves the primary recommendation task, and AutoCDSR+, which constrains transfer through Pareto-optimal information bottleneck tokens. They treat the problem as a two-task multi-objective learning problem and deploy a preference-aware Pareto front via MGDA and Frank-Wolfe-inspired updates to balance objectives, yielding a plug-and-play module that attaches to standard transformers like SASRec and BERT4Rec. Experiments on multiple academic and industrial datasets show significant CDSR gains with minimal computational overhead, and attention analyses reveal that AutoCDSR effectively suppresses harmful cross-domain interactions while preserving beneficial ones. The proposed approach offers a practical pathway to deploy cross-domain modeling in production, bridging the gap between state-of-the-art CDSR methods and scalable, domain-agnostic recommender systems.
Abstract
Sequential recommendation is a popular paradigm in modern recommender systems. In particular, one challenging problem in this space is cross-domain sequential recommendation (CDSR), which aims to predict future behaviors given user interactions across multiple domains. Existing CDSR frameworks are mostly built on the self-attention transformer and seek to improve by explicitly injecting additional domain-specific components (e.g. domain-aware module blocks). While these additional components help, we argue they overlook the core self-attention module already present in the transformer, a naturally powerful tool to learn correlations among behaviors. In this work, we aim to improve the CDSR performance for simple models from a novel perspective of enhancing the self-attention. Specifically, we introduce a Pareto-optimal self-attention and formulate the cross-domain learning as a multi-objective problem, where we optimize the recommendation task while dynamically minimizing the cross-domain attention scores. Our approach automates knowledge transfer in CDSR (dubbed as AutoCDSR) -- it not only mitigates negative transfer but also encourages complementary knowledge exchange among auxiliary domains. Based on the idea, we further introduce AutoCDSR+, a more performant variant with slight additional cost. Our proposal is easy to implement and works as a plug-and-play module that can be incorporated into existing transformer-based recommenders. Besides flexibility, it is practical to deploy because it brings little extra computational overheads without heavy hyper-parameter tuning. AutoCDSR on average improves Recall@10 for SASRec and Bert4Rec by 9.8% and 16.0% and NDCG@10 by 12.0% and 16.7%, respectively. Code is available at https://github.com/snap-research/AutoCDSR.
