Contrastive Pre-training for Deep Session Data Understanding
Zixuan Li, Lizi Liao, Yunshan Ma, Tat-Seng Chua
TL;DR
This paper tackles understanding user behavior from semi-structured e-commerce session data by introducing the User Behavior Model (UBM), a two-level Transformer that jointly encodes textual item/textual queries and action sequences. It adopts a two-stage contrastive pre-training regime with multiple session- and item-level augmentations to learn robust representations, followed by lightweight task-specific fine-tuning for purchase intention prediction, remaining length, and next-item prediction. Empirical results on a real-world dataset show that UBM outperforms general-domain and domain-adapted baselines across all downstream tasks, with particular strength in handling data sparsity and capturing intra-item semantics, inter-item relationships, and inter-interaction dependencies. The work demonstrates the practical impact of unified, contrastively pre-trained representations for diverse e-commerce tasks and provides a principled framework for augmentations and evaluation of session-based models.
Abstract
Session data has been widely used for understanding user's behavior in e-commerce. Researchers are trying to leverage session data for different tasks, such as purchase intention prediction, remaining length prediction, recommendation, etc., as it provides context clues about the user's dynamic interests. However, online shopping session data is semi-structured and complex in nature, which contains both unstructured textual data about the products, search queries, and structured user action sequences. Most existing works focus on leveraging the coarse-grained item sequences for specific tasks, while largely ignore the fine-grained information from text and user action details. In this work, we delve into deep session data understanding via scrutinizing the various clues inside the rich information in user sessions. Specifically, we propose to pre-train a general-purpose User Behavior Model (UBM) over large-scale session data with rich details, such as product title, attributes and various kinds of user actions. A two-stage pre-training scheme is introduced to encourage the model to self-learn from various augmentations with contrastive learning objectives, which spans different granularity levels of session data. Then the well-trained session understanding model can be easily fine-tuned for various downstream tasks. Extensive experiments show that UBM better captures the complex intra-item semantic relations, inter-item connections and inter-interaction dependencies, leading to large performance gains as compared to the baselines on several downstream tasks. And it also demonstrates strong robustness when data is sparse.
