Table of Contents
Fetching ...

Positional encoding is not the same as context: A study on positional encoding for sequential recommendation

Alejo Lopez-Avila, Jinhua Du, Abbas Shimary, Ze Li

TL;DR

This paper disentangles positional encoding from temporal context in sequential recommendation, showing that encodings provide unique relational cues beyond the temporal footprint. It introduces Rotatory encoding and its concatenated variant, and performs extensive evaluations across eight Amazon datasets to measure both accuracy and training stability. The study finds RMHA-4 offers the strongest stability on high-deviation datasets, while Rotatory variants excel on low-deviation data, with extended training further illuminating encoding effects. The findings argue that careful encoding selection is essential for robust, reliable transformer-based SRS, and ultimately achieve state-of-the-art results at the time of publication.

Abstract

The rapid growth of streaming media and e-commerce has driven advancements in recommendation systems, particularly Sequential Recommendation Systems (SRS). These systems employ users' interaction histories to predict future preferences. While recent research has focused on architectural innovations like transformer blocks and feature extraction, positional encodings, crucial for capturing temporal patterns, have received less attention. These encodings are often conflated with contextual, such as the temporal footprint, which previous works tend to treat as interchangeable with positional information. This paper highlights the critical distinction between temporal footprint and positional encodings, demonstrating that the latter offers unique relational cues between items, which the temporal footprint alone cannot provide. Through extensive experimentation on eight Amazon datasets and subsets, we assess the impact of various encodings on performance metrics and training stability. We introduce new positional encodings and investigate integration strategies that improve both metrics and stability, surpassing state-of-the-art results at the time of this work's initial preprint. Importantly, we demonstrate that selecting the appropriate encoding is not only key to better performance but also essential for building robust, reliable SRS models.

Positional encoding is not the same as context: A study on positional encoding for sequential recommendation

TL;DR

This paper disentangles positional encoding from temporal context in sequential recommendation, showing that encodings provide unique relational cues beyond the temporal footprint. It introduces Rotatory encoding and its concatenated variant, and performs extensive evaluations across eight Amazon datasets to measure both accuracy and training stability. The study finds RMHA-4 offers the strongest stability on high-deviation datasets, while Rotatory variants excel on low-deviation data, with extended training further illuminating encoding effects. The findings argue that careful encoding selection is essential for robust, reliable transformer-based SRS, and ultimately achieve state-of-the-art results at the time of publication.

Abstract

The rapid growth of streaming media and e-commerce has driven advancements in recommendation systems, particularly Sequential Recommendation Systems (SRS). These systems employ users' interaction histories to predict future preferences. While recent research has focused on architectural innovations like transformer blocks and feature extraction, positional encodings, crucial for capturing temporal patterns, have received less attention. These encodings are often conflated with contextual, such as the temporal footprint, which previous works tend to treat as interchangeable with positional information. This paper highlights the critical distinction between temporal footprint and positional encodings, demonstrating that the latter offers unique relational cues between items, which the temporal footprint alone cannot provide. Through extensive experimentation on eight Amazon datasets and subsets, we assess the impact of various encodings on performance metrics and training stability. We introduce new positional encodings and investigate integration strategies that improve both metrics and stability, surpassing state-of-the-art results at the time of this work's initial preprint. Importantly, we demonstrate that selecting the appropriate encoding is not only key to better performance but also essential for building robust, reliable SRS models.
Paper Structure (47 sections, 17 equations, 6 figures, 21 tables)

This paper contains 47 sections, 17 equations, 6 figures, 21 tables.

Figures (6)

  • Figure 1: Illustration of our proposed Rotatory embeddings. Similar to RoPE , it encodes positional information through the angles of the embeddings. However, unlike RoPE , which assigns angles proportionally based on position, Rotatory learns these angles during training. It allows the model to determine the most relevant positional information and adjust their significance accordingly.
  • Figure 2: (1) Vector encoding added before Transformer blocks, (2) First head RPE: Relative encoding applied only to the first block (RopeOne ), and (3) All head RPE: Relative encoding integrated into every block (RMHA-4 ).
  • Figure 3: APE vs RPE : While APE encodings introduce the positional information as vectors before the $V$, $Q$ and $K$. RPE add this information at the coefficient level to $K$ and $Q$. Parts affected by the position information appear in red.
  • Figure 4: Losses randomly selected from None , first row, and RMHA-4 , second row, for the test set with $0.0001$ and silu. Dataset: Fashion
  • Figure 5: Losses randomly selected from None , first row, and RMHA-4 , second row, for the test set with $0.0001$ and silu. Dataset: Men
  • ...and 1 more figures