Contrastive Pre-training for Deep Session Data Understanding

Zixuan Li; Lizi Liao; Yunshan Ma; Tat-Seng Chua

Contrastive Pre-training for Deep Session Data Understanding

Zixuan Li, Lizi Liao, Yunshan Ma, Tat-Seng Chua

TL;DR

This paper tackles understanding user behavior from semi-structured e-commerce session data by introducing the User Behavior Model (UBM), a two-level Transformer that jointly encodes textual item/textual queries and action sequences. It adopts a two-stage contrastive pre-training regime with multiple session- and item-level augmentations to learn robust representations, followed by lightweight task-specific fine-tuning for purchase intention prediction, remaining length, and next-item prediction. Empirical results on a real-world dataset show that UBM outperforms general-domain and domain-adapted baselines across all downstream tasks, with particular strength in handling data sparsity and capturing intra-item semantics, inter-item relationships, and inter-interaction dependencies. The work demonstrates the practical impact of unified, contrastively pre-trained representations for diverse e-commerce tasks and provides a principled framework for augmentations and evaluation of session-based models.

Abstract

Session data has been widely used for understanding user's behavior in e-commerce. Researchers are trying to leverage session data for different tasks, such as purchase intention prediction, remaining length prediction, recommendation, etc., as it provides context clues about the user's dynamic interests. However, online shopping session data is semi-structured and complex in nature, which contains both unstructured textual data about the products, search queries, and structured user action sequences. Most existing works focus on leveraging the coarse-grained item sequences for specific tasks, while largely ignore the fine-grained information from text and user action details. In this work, we delve into deep session data understanding via scrutinizing the various clues inside the rich information in user sessions. Specifically, we propose to pre-train a general-purpose User Behavior Model (UBM) over large-scale session data with rich details, such as product title, attributes and various kinds of user actions. A two-stage pre-training scheme is introduced to encourage the model to self-learn from various augmentations with contrastive learning objectives, which spans different granularity levels of session data. Then the well-trained session understanding model can be easily fine-tuned for various downstream tasks. Extensive experiments show that UBM better captures the complex intra-item semantic relations, inter-item connections and inter-interaction dependencies, leading to large performance gains as compared to the baselines on several downstream tasks. And it also demonstrates strong robustness when data is sparse.

Contrastive Pre-training for Deep Session Data Understanding

TL;DR

Abstract

Paper Structure (36 sections, 12 equations, 6 figures, 2 tables)

This paper contains 36 sections, 12 equations, 6 figures, 2 tables.

Introduction
Related Work
E-commerce Session Data Modelling
Pre-training
Contrastive Learning
Approach
UBM Network Architecture
Input Session Data Details
Low-level Interaction Encoder
High-level Session Encoder
Two-stage Pre-training
Pre-training Stage 1
Pre-training Stage 2
Data Augmentation Strategies
Dropout Masking
...and 21 more sections

Figures (6)

Figure 1: Illustration of a typical e-commerce user session. The user interacts with products, search for products and add target item into cart through a sequence of interactions.
Figure 2: Overview of the proposed UBM model. It is designed to be a two-level hierarchical structure with two-stage of pre-training. The low-level BERT-based Interaction Encoder is firstly pre-trained to captures intra-item semantics and inter-item connections. Then the whole UBM model is further pre-trained to encourage the high-level Transformer-based Session Encoder to learn inter-interaction dependencies, and allow the Interaction Encoder to further capture inter-action relations. After pre-training, a few simple task-specific layers are plugged in for downstream task fine-tuning.
Figure 3: Augmentation Strategies. Dropout Masking is automatically applied on all inputs; Item Token Masking and Next Item Pairing are used for item level augmentation; Behavior Reordering and Action and Item Token Masking are used for session level augmentation. By using these augmentation strategies in a two-stage pre-training, we manage to capture the complex intra-item semantic relations, inter-item connections and inter-interaction dependencies.
Figure 4: Visualization of example sessions' representations learned by UBM versus BERT$_{mini}$.
Figure 5: Models' NIP performance on different item groups.
...and 1 more figures

Contrastive Pre-training for Deep Session Data Understanding

TL;DR

Abstract

Contrastive Pre-training for Deep Session Data Understanding

Authors

TL;DR

Abstract

Table of Contents

Figures (6)