Pre-train, Align, and Disentangle: Empowering Sequential Recommendation with Large Language Models
Yuhao Wang, Junwei Pan, Pengyue Jia, Wanyu Wang, Maolin Wang, Zhixiang Feng, Xiaotian Li, Jie Jiang, Xiangyu Zhao
TL;DR
The paper tackles cold-start and distribution shift in sequential recommendation by integrating large language model (LLM) textual signals with collaborative item dynamics. It introduces PAD, a three-phase framework: pre-train LLM and recommender, align textual and collaborative embeddings with a characteristic MK-MMD-based loss and Rec-Anchored guidance, and fine-tune via a Triple-Experts architecture with frequency-aware gating to disentangle modalities. Empirical results on three public datasets show state-of-the-art performance, with particular gains on cold items and strong compatibility with multiple SR backbones. The work offers a practical, model-agnostic approach to incorporating rich textual context into SR while mitigating catastrophic forgetting and preserving collaborative semantics.
Abstract
Sequential Recommendation (SR) aims to leverage the sequential patterns in users' historical interactions to accurately track their preferences. However, the primary reliance of existing SR methods on collaborative data results in challenges such as the cold-start problem and sub-optimal performance. Concurrently, despite the proven effectiveness of large language models (LLMs), their integration into commercial recommender systems is impeded by issues such as high inference latency, incomplete capture of all distribution statistics, and catastrophic forgetting. To address these issues, we introduce a novel Pre-train, Align, and Disentangle (PAD) framework to enhance SR models with LLMs. In particular, we initially pre-train both the SR and LLM models to obtain collaborative and textual embeddings. Subsequently, we propose a characteristic recommendation-anchored alignment loss using multi-kernel maximum mean discrepancy with Gaussian kernels. Lastly, a triple-experts architecture, comprising aligned and modality-specific experts with disentangled embeddings, is fine-tuned in a frequency-aware manner. Experimental results on three public datasets validate the efficacy of PAD, indicating substantial enhancements and compatibility with various SR backbone models, particularly for cold items. The code and datasets are accessible for reproduction at https://github.com/Applied-Machine-Learning-Lab/PAD.
