Table of Contents
Fetching ...

Spatio-Temporal Contrastive Learning Enhanced GNNs for Session-based Recommendation

Zhongwei Wan, Xin Liu, Benyou Wang, Jiezhong Qiu, Boyu Li, Ting Guo, Guangyong Chen, Yang Wang

TL;DR

RESTC tackles the temporal information loss in graph-based session recommendations by introducing a spatio-temporal contrastive learning framework that aligns temporal and spatial views of a session. It combines a MGAT-based spatial encoder with a Session Transformer temporal encoder and strengthens the spatial view with a global Collaborative Filtering Graph, enabling cross-view interaction and improved next-item prediction. Empirical results on six public datasets show RESTC consistently outperforms state-of-the-art baselines, with ablations confirming the contributions of the temporal encoder, CFG augmentation, and cross-view contrastive learning. The work advances SBR by bridging spatial and temporal representations in a unified latent space and providing a robust, model-agnostic approach for leveraging both local session structure and global co-occurrence information.

Abstract

Session-based recommendation (SBR) systems aim to utilize the user's short-term behavior sequence to predict the next item without the detailed user profile. Most recent works try to model the user preference by treating the sessions as between-item transition graphs and utilize various graph neural networks (GNNs) to encode the representations of pair-wise relations among items and their neighbors. Some of the existing GNN-based models mainly focus on aggregating information from the view of spatial graph structure, which ignores the temporal relations within neighbors of an item during message passing and the information loss results in a sub-optimal problem. Other works embrace this challenge by incorporating additional temporal information but lack sufficient interaction between the spatial and temporal patterns. To address this issue, inspired by the uniformity and alignment properties of contrastive learning techniques, we propose a novel framework called Session-based Recommendation with Spatio-Temporal Contrastive Learning Enhanced GNNs (RESTC). The idea is to supplement the GNN-based main supervised recommendation task with the temporal representation via an auxiliary cross-view contrastive learning mechanism. Furthermore, a novel global collaborative filtering graph (CFG) embedding is leveraged to enhance the spatial view in the main task. Extensive experiments demonstrate the significant performance of RESTC compared with the state-of-the-art baselines e.g., with an improvement as much as 27.08% gain on HR@20 and 20.10% gain on MRR@20.

Spatio-Temporal Contrastive Learning Enhanced GNNs for Session-based Recommendation

TL;DR

RESTC tackles the temporal information loss in graph-based session recommendations by introducing a spatio-temporal contrastive learning framework that aligns temporal and spatial views of a session. It combines a MGAT-based spatial encoder with a Session Transformer temporal encoder and strengthens the spatial view with a global Collaborative Filtering Graph, enabling cross-view interaction and improved next-item prediction. Empirical results on six public datasets show RESTC consistently outperforms state-of-the-art baselines, with ablations confirming the contributions of the temporal encoder, CFG augmentation, and cross-view contrastive learning. The work advances SBR by bridging spatial and temporal representations in a unified latent space and providing a robust, model-agnostic approach for leveraging both local session structure and global co-occurrence information.

Abstract

Session-based recommendation (SBR) systems aim to utilize the user's short-term behavior sequence to predict the next item without the detailed user profile. Most recent works try to model the user preference by treating the sessions as between-item transition graphs and utilize various graph neural networks (GNNs) to encode the representations of pair-wise relations among items and their neighbors. Some of the existing GNN-based models mainly focus on aggregating information from the view of spatial graph structure, which ignores the temporal relations within neighbors of an item during message passing and the information loss results in a sub-optimal problem. Other works embrace this challenge by incorporating additional temporal information but lack sufficient interaction between the spatial and temporal patterns. To address this issue, inspired by the uniformity and alignment properties of contrastive learning techniques, we propose a novel framework called Session-based Recommendation with Spatio-Temporal Contrastive Learning Enhanced GNNs (RESTC). The idea is to supplement the GNN-based main supervised recommendation task with the temporal representation via an auxiliary cross-view contrastive learning mechanism. Furthermore, a novel global collaborative filtering graph (CFG) embedding is leveraged to enhance the spatial view in the main task. Extensive experiments demonstrate the significant performance of RESTC compared with the state-of-the-art baselines e.g., with an improvement as much as 27.08% gain on HR@20 and 20.10% gain on MRR@20.
Paper Structure (46 sections, 16 equations, 10 figures, 5 tables, 1 algorithm)

This paper contains 46 sections, 16 equations, 10 figures, 5 tables, 1 algorithm.

Figures (10)

  • Figure 1: Two distinct sessions may be represented as the same graph if the temporal information is omitted, indicating the temporal pattern should be sufficiently considered to supplement GNN-based models for SBR task.
  • Figure 2: Sampling sessions from real public dataset. The numbers in the nodes denotes the index of items.
  • Figure 3: Three essential information among sessions data: (A) temporal view of a session is about a behavioral sequence containing user's dynamic preference w.r.t its timeline; (B) spatial view of a session refers to a between-item transition directed graph, each edge of which indicates a behavior shift from the source item to the target item --- for example, a user has clicked item $v_2$ after $v_1$. Note that behavior shift associated with an edge could happen many times in a session, and such edges are orthogonal to time; (C) collaborative filtering information in other sessions could be extracted from a global weighted graph then used to compensate for the item profiles in a short-term session.
  • Figure 4: Overview of RESTC. MGAT, SESTrans, CFG encoder represent Multi-relational Graph Attention Network in Sec \ref{['sec:cgat']}, Session Transformer in Sec \ref{['sec:SESTrans']}, Collaborative Filtering Graph encoder in Sec \ref{['sec:cfg_embedding']}, respectively.
  • Figure 5: Results of RESTC with different GNN-based spatial encoders. GSLT denotes GraphSAGE-LSTM, MGAT w/o MH denotes the single-head MGAT.
  • ...and 5 more figures