Table of Contents
Fetching ...

A Contextual-Aware Position Encoding for Sequential Recommendation

Jun Yuan, Guohao Cai, Zhenhua Dong

TL;DR

The paper tackles the mismatch between NLP-style position encoding and sequential recommendation by introducing CAPE, a context-dependent, dissimilarity-based position encoding with a gate-based fusion and interpolation to align item and position embeddings. CAPE computes context-derived positions $p_j$ from target-context dissimilarity and uses an interpolation-based embedding $e[p_j]$ to influence attention logits, enabling efficient and flexible position representations for SR. It demonstrates consistent, state-of-the-art improvements across multiple SR backbones on public benchmarks and confirms real-world impact via online A/B testing (eCPM gains). The work highlights the importance of SR-specific position encoding and shows CAPE’s scalability to large models and robustness across fusion strategies, suggesting broad applicability to future SR tasks.

Abstract

Sequential recommendation (SR), which encodes user activity to predict the next action, has emerged as a widely adopted strategy in developing commercial personalized recommendation systems. A critical component of modern SR models is the attention mechanism, which synthesizes users' historical activities. This mechanism is typically order-invariant and generally relies on position encoding (PE). Conventional SR models simply assign a learnable vector to each position, resulting in only modest gains compared to traditional recommendation models. Moreover, limited research has been conducted on position encoding tailored for sequential recommendation, leaving a significant gap in addressing its unique requirements. To bridge this gap, we propose a novel Contextual-Aware Position Encoding method for sequential recommendation, abbreviated as CAPE. To the best of our knowledge, CAPE is the first PE method specifically designed for sequential recommendation. Comprehensive experiments conducted on benchmark SR datasets demonstrate that CAPE consistently enhances multiple mainstream backbone models and achieves state-of-the-art performance, across small and large scale model size. Furthermore, we deployed CAPE in an industrial setting on a real-world commercial platform, clearly showcasing the effectiveness of our approach. Our source code is available at https://github.com/yjdy/CAPE.

A Contextual-Aware Position Encoding for Sequential Recommendation

TL;DR

The paper tackles the mismatch between NLP-style position encoding and sequential recommendation by introducing CAPE, a context-dependent, dissimilarity-based position encoding with a gate-based fusion and interpolation to align item and position embeddings. CAPE computes context-derived positions from target-context dissimilarity and uses an interpolation-based embedding to influence attention logits, enabling efficient and flexible position representations for SR. It demonstrates consistent, state-of-the-art improvements across multiple SR backbones on public benchmarks and confirms real-world impact via online A/B testing (eCPM gains). The work highlights the importance of SR-specific position encoding and shows CAPE’s scalability to large models and robustness across fusion strategies, suggesting broad applicability to future SR tasks.

Abstract

Sequential recommendation (SR), which encodes user activity to predict the next action, has emerged as a widely adopted strategy in developing commercial personalized recommendation systems. A critical component of modern SR models is the attention mechanism, which synthesizes users' historical activities. This mechanism is typically order-invariant and generally relies on position encoding (PE). Conventional SR models simply assign a learnable vector to each position, resulting in only modest gains compared to traditional recommendation models. Moreover, limited research has been conducted on position encoding tailored for sequential recommendation, leaving a significant gap in addressing its unique requirements. To bridge this gap, we propose a novel Contextual-Aware Position Encoding method for sequential recommendation, abbreviated as CAPE. To the best of our knowledge, CAPE is the first PE method specifically designed for sequential recommendation. Comprehensive experiments conducted on benchmark SR datasets demonstrate that CAPE consistently enhances multiple mainstream backbone models and achieves state-of-the-art performance, across small and large scale model size. Furthermore, we deployed CAPE in an industrial setting on a real-world commercial platform, clearly showcasing the effectiveness of our approach. Our source code is available at https://github.com/yjdy/CAPE.

Paper Structure

This paper contains 24 sections, 11 equations, 4 figures, 4 tables.

Figures (4)

  • Figure 1: Overview of mainstream sequential recommendation models. We only demonstrate the target attention mechanism, which can be seen as calculating the attention output of last item in self-attention
  • Figure 2: Overview of CAPE. CAPE first computes the dissimilarity of target item and context items, and then accumulating the dissimilarity values to get the position of each context item. In this way, CAPE tends to assign same position to similar context items and vice versa. CAPE can be adopted to fit self-attention easily.
  • Figure 3: Ablation study on context length and position embedding dimension in AmazonElectronics. When position embedding dimension is 32, CAPE is able to outperform all baselines across all context length.
  • Figure 4: Comparison of different fusion methods in AmazonElectronics. Naive fusion method deteriorates DIN, and None method deteriorates SASRec. CAPE enhances two attention type models significantly.