Table of Contents
Fetching ...

Pre-train, Align, and Disentangle: Empowering Sequential Recommendation with Large Language Models

Yuhao Wang, Junwei Pan, Pengyue Jia, Wanyu Wang, Maolin Wang, Zhixiang Feng, Xiaotian Li, Jie Jiang, Xiangyu Zhao

TL;DR

The paper tackles cold-start and distribution shift in sequential recommendation by integrating large language model (LLM) textual signals with collaborative item dynamics. It introduces PAD, a three-phase framework: pre-train LLM and recommender, align textual and collaborative embeddings with a characteristic MK-MMD-based loss and Rec-Anchored guidance, and fine-tune via a Triple-Experts architecture with frequency-aware gating to disentangle modalities. Empirical results on three public datasets show state-of-the-art performance, with particular gains on cold items and strong compatibility with multiple SR backbones. The work offers a practical, model-agnostic approach to incorporating rich textual context into SR while mitigating catastrophic forgetting and preserving collaborative semantics.

Abstract

Sequential Recommendation (SR) aims to leverage the sequential patterns in users' historical interactions to accurately track their preferences. However, the primary reliance of existing SR methods on collaborative data results in challenges such as the cold-start problem and sub-optimal performance. Concurrently, despite the proven effectiveness of large language models (LLMs), their integration into commercial recommender systems is impeded by issues such as high inference latency, incomplete capture of all distribution statistics, and catastrophic forgetting. To address these issues, we introduce a novel Pre-train, Align, and Disentangle (PAD) framework to enhance SR models with LLMs. In particular, we initially pre-train both the SR and LLM models to obtain collaborative and textual embeddings. Subsequently, we propose a characteristic recommendation-anchored alignment loss using multi-kernel maximum mean discrepancy with Gaussian kernels. Lastly, a triple-experts architecture, comprising aligned and modality-specific experts with disentangled embeddings, is fine-tuned in a frequency-aware manner. Experimental results on three public datasets validate the efficacy of PAD, indicating substantial enhancements and compatibility with various SR backbone models, particularly for cold items. The code and datasets are accessible for reproduction at https://github.com/Applied-Machine-Learning-Lab/PAD.

Pre-train, Align, and Disentangle: Empowering Sequential Recommendation with Large Language Models

TL;DR

The paper tackles cold-start and distribution shift in sequential recommendation by integrating large language model (LLM) textual signals with collaborative item dynamics. It introduces PAD, a three-phase framework: pre-train LLM and recommender, align textual and collaborative embeddings with a characteristic MK-MMD-based loss and Rec-Anchored guidance, and fine-tune via a Triple-Experts architecture with frequency-aware gating to disentangle modalities. Empirical results on three public datasets show state-of-the-art performance, with particular gains on cold items and strong compatibility with multiple SR backbones. The work offers a practical, model-agnostic approach to incorporating rich textual context into SR while mitigating catastrophic forgetting and preserving collaborative semantics.

Abstract

Sequential Recommendation (SR) aims to leverage the sequential patterns in users' historical interactions to accurately track their preferences. However, the primary reliance of existing SR methods on collaborative data results in challenges such as the cold-start problem and sub-optimal performance. Concurrently, despite the proven effectiveness of large language models (LLMs), their integration into commercial recommender systems is impeded by issues such as high inference latency, incomplete capture of all distribution statistics, and catastrophic forgetting. To address these issues, we introduce a novel Pre-train, Align, and Disentangle (PAD) framework to enhance SR models with LLMs. In particular, we initially pre-train both the SR and LLM models to obtain collaborative and textual embeddings. Subsequently, we propose a characteristic recommendation-anchored alignment loss using multi-kernel maximum mean discrepancy with Gaussian kernels. Lastly, a triple-experts architecture, comprising aligned and modality-specific experts with disentangled embeddings, is fine-tuned in a frequency-aware manner. Experimental results on three public datasets validate the efficacy of PAD, indicating substantial enhancements and compatibility with various SR backbone models, particularly for cold items. The code and datasets are accessible for reproduction at https://github.com/Applied-Machine-Learning-Lab/PAD.

Paper Structure

This paper contains 35 sections, 8 equations, 7 figures, 3 tables, 1 algorithm.

Figures (7)

  • Figure 1: Overall framework of PAD. The number in parentheses (128 and 4096) denotes the embedding dimension. The prediction logit is calculated by multiplying the sequence embedding (output of recommendation model) and target item embedding. For simplicity the multiplication operation is omitted.
  • Figure 2: Comparison of the original SASRec and PAD on the Mind (left), Electronics (mid) and Prime Pantry (right) datasets. Warm, median, cold denote target items with high to low frequency on the test set. The y-axis denotes the HR@10.
  • Figure 3: $\mathcal{P}_{\text{Top-10\% }}^\text{ID}$ and $\mathcal{P}_{\text{Bottom-10\% }}^\text{ID}$ under the distance distribution regarding the collaborative and textual embeddings in original SASRec, SMEM, CTRL, and our proposed PAD on cold items.
  • Figure 4: Comparison of anchored and non-anchored alignment losses on the Mind (left), Electronics (mid) and Prime Pantry (right) datasets. The y-axis denotes the HR@10.
  • Figure 5: (a) Comparison of alignment losses and (b) ablation study comparing different model variants on MIND dataset.
  • ...and 2 more figures