Look into the Future: Deep Contextualized Sequential Recommendation

Lei Zheng; Ning Li; Yanhuan Huang; Ruiwen Xu; Weinan Zhang; Yong Yu

Look into the Future: Deep Contextualized Sequential Recommendation

Lei Zheng, Ning Li, Yanhuan Huang, Ruiwen Xu, Weinan Zhang, Yong Yu

TL;DR

The paper tackles the challenge of modeling evolving user interests in sequential recommendation by leveraging future information without causing data leakage. It proposes LIFT, a retrieval-based framework that constructs rich interaction contexts from past and future behaviors of retrieved similar users, combined with a masked-behavior pretraining objective to learn strong contextual representations. The model comprises a decoder-only Transformer encoder, a BM25-based retriever, and a key-based attention predictor that fuses target, history, and retrieved context to predict user responses, achieving consistent gains over strong baselines on CTR and top-N tasks. This approach demonstrates the value of global-context retrieval and self-supervised context learning for more accurate and robust sequential recommendations, with practical implications for deploying context-aware recommenders at scale.

Abstract

Sequential recommendation aims to estimate how a user's interests evolve over time via uncovering valuable patterns from user behavior history. Many previous sequential models have solely relied on users' historical information to model the evolution of their interests, neglecting the crucial role that future information plays in accurately capturing these dynamics. However, effectively incorporating future information in sequential modeling is non-trivial since it is impossible to make the current-step prediction for any target user by leveraging his future data. In this paper, we propose a novel framework of sequential recommendation called Look into the Future (LIFT), which builds and leverages the contexts of sequential recommendation. In LIFT, the context of a target user's interaction is represented based on i) his own past behaviors and ii) the past and future behaviors of the retrieved similar interactions from other users. As such, the learned context will be more informative and effective in predicting the target user's behaviors in sequential recommendation without temporal data leakage. Furthermore, in order to exploit the intrinsic information embedded within the context itself, we introduce an innovative pretraining methodology incorporating behavior masking. In our extensive experiments on five real-world datasets, LIFT achieves significant performance improvement on click-through rate prediction and rating prediction tasks in sequential recommendation over strong baselines, demonstrating that retrieving and leveraging relevant contexts from the global user pool greatly benefits sequential recommendation. The experiment code is provided at https://anonymous.4open.science/r/LIFT-277C/Readme.md.

Look into the Future: Deep Contextualized Sequential Recommendation

TL;DR

Abstract

Paper Structure (20 sections, 18 equations, 8 figures, 6 tables)

This paper contains 20 sections, 18 equations, 8 figures, 6 tables.

Introduction
Related Work
Formulation & Preliminaries
The LIFT Framework
Overview
Encoder
Retriever
Predictor
Time Complexity & Speedup
Experiments
Datasets
Evaluation Metrics
Compared Methods
Overall Performance (RQ1)
Further Analysis
...and 5 more sections

Figures (8)

Figure 1: The comparison between LIFT and conventional models entails several key distinctions: a) Traditional models rely solely on instant user and item information when making predictions. b) Sequential models, conversely, typically incorporate the user’s historical interactions to capture their evolving interests over time. c) Retrieval-based models perform retrieval to fetch far-before but relevant historical behaviors to build the user profile for predictions. d) LIFT focuses on interaction context, encompassing both the historical and future sequence of interactions for each user-item interaction.
Figure 2: The architectural components of the encoder and predictor within the LIFT framework are as follows: (a) Pretrained Sequence Encoder: The embedding layer is omitted in this component. LIFT employs a decoder-only Transformer architecture as the encoder, which undergoes pretraining via the mask behavior loss. During the pretraining stage, the primary focus is on leveraging contextual information inherent within the sequence data itself. (b) Training of the Predictor: During the training stage, emphasis is placed solely on the training of the predictor. LIFT incorporates three distinct types of information to inform its predictions, namely the target sample $x_t$ itself, the user's historical interactions, and the retrieved context, encompassing historical interactions from similar instances as well as future interactions. In this phase, the label information utilized for training is exclusively derived from the target samples.
Figure 3: The overview workflow in LIFT. In the initial phase (Stage 1), we pretrain an encoder to convert sequences into embeddings. In the subsequent phase (Stage 2), we illustrate the data flow in the figure. In Step 1, we send $x_z$ as a query to the retriever. In Step 2, the retriever outputs the retrieval result. In Step 3, we input the retrieval result into the encoder. In Step 4, we obtain the embeddings from the encoder. In Step 5, we send $x_z$ and the encoded embeddings to the predictor to obtain the final result.
Figure 4: Performance w.r.t. different pretraining mask rates.
Figure 5: Inference time on Alipay.
...and 3 more figures

Look into the Future: Deep Contextualized Sequential Recommendation

TL;DR

Abstract

Look into the Future: Deep Contextualized Sequential Recommendation

Authors

TL;DR

Abstract

Table of Contents

Figures (8)