Positional encoding is not the same as context: A study on positional encoding for sequential recommendation
Alejo Lopez-Avila, Jinhua Du, Abbas Shimary, Ze Li
TL;DR
This paper disentangles positional encoding from temporal context in sequential recommendation, showing that encodings provide unique relational cues beyond the temporal footprint. It introduces Rotatory encoding and its concatenated variant, and performs extensive evaluations across eight Amazon datasets to measure both accuracy and training stability. The study finds RMHA-4 offers the strongest stability on high-deviation datasets, while Rotatory variants excel on low-deviation data, with extended training further illuminating encoding effects. The findings argue that careful encoding selection is essential for robust, reliable transformer-based SRS, and ultimately achieve state-of-the-art results at the time of publication.
Abstract
The rapid growth of streaming media and e-commerce has driven advancements in recommendation systems, particularly Sequential Recommendation Systems (SRS). These systems employ users' interaction histories to predict future preferences. While recent research has focused on architectural innovations like transformer blocks and feature extraction, positional encodings, crucial for capturing temporal patterns, have received less attention. These encodings are often conflated with contextual, such as the temporal footprint, which previous works tend to treat as interchangeable with positional information. This paper highlights the critical distinction between temporal footprint and positional encodings, demonstrating that the latter offers unique relational cues between items, which the temporal footprint alone cannot provide. Through extensive experimentation on eight Amazon datasets and subsets, we assess the impact of various encodings on performance metrics and training stability. We introduce new positional encodings and investigate integration strategies that improve both metrics and stability, surpassing state-of-the-art results at the time of this work's initial preprint. Importantly, we demonstrate that selecting the appropriate encoding is not only key to better performance but also essential for building robust, reliable SRS models.
