Enhancing ID and Text Fusion via Alternative Training in Session-based Recommendation

Juanhui Li; Haoyu Han; Zhikai Chen; Harry Shomer; Wei Jin; Amin Javari; Jiliang Tang

Enhancing ID and Text Fusion via Alternative Training in Session-based Recommendation

Juanhui Li, Haoyu Han, Zhikai Chen, Harry Shomer, Wei Jin, Amin Javari, Jiliang Tang

TL;DR

This paper tackles the challenge of effectively combining ID-based and textual signals in session-based recommendation. It identifies an imbalance in naive fusion, where ID information often dominates, and proposes AlterRec, which uses two dedicated uni-modal networks trained via an alternating scheme that leverages inter-modal predictions through hard negatives and augmented positives. The method yields superior performance across multiple real-world datasets, improves long-tail item recommendations, and benefits from augmentation; ablations confirm the necessity of hard negatives and cross-modal interaction. The result is a practical, scalable approach that enhances text utilization in recommendations and offers a path for integrating stronger language-model-based representations in future work.

Abstract

Session-based recommendation has gained increasing attention in recent years, with its aim to offer tailored suggestions based on users' historical behaviors within sessions. To advance this field, a variety of methods have been developed, with ID-based approaches typically demonstrating promising performance. However, these methods often face challenges with long-tail items and overlook other rich forms of information, notably valuable textual semantic information. To integrate text information, various methods have been introduced, mostly following a naive fusion framework. Surprisingly, we observe that fusing these two modalities does not consistently outperform the best single modality by following the naive fusion framework. Further investigation reveals an potential imbalance issue in naive fusion, where the ID dominates and text modality is undertrained. This suggests that the unexpected observation may stem from naive fusion's failure to effectively balance the two modalities, often over-relying on the stronger ID modality. This insight suggests that naive fusion might not be as effective in combining ID and text as previously expected. To address this, we propose a novel alternative training strategy AlterRec. It separates the training of ID and text, thereby avoiding the imbalance issue seen in naive fusion. Additionally, AlterRec designs a novel strategy to facilitate the interaction between the two modalities, enabling them to mutually learn from each other and integrate the text more effectively. Comprehensive experiments demonstrate the effectiveness of AlterRec in session-based recommendation. The implementation is available at https://github.com/Juanhui28/AlterRec.

Enhancing ID and Text Fusion via Alternative Training in Session-based Recommendation

TL;DR

Abstract

Paper Structure (33 sections, 16 equations, 9 figures, 3 tables, 1 algorithm)

This paper contains 33 sections, 16 equations, 9 figures, 3 tables, 1 algorithm.

Introduction
Related Work
ID-based Methods
Text-Integrated Methods
Multi-modal Learning
Preliminaries
Session-based Recommendation
Preliminary Study
Naive Fusion vs. Independent Training
Exploration of NFRec
Framework
ID and Text Uni-modal Networks
ID Encoder
Text Encoder
Scoring Function
...and 18 more sections

Figures (9)

Figure 1: An illustration of a naive fusion framework.
Figure 2: Session-based recommendation results (%) on the Amazon-French dataset. We compare the models combing ID and text against models trained independently on either ID or text information alone
Figure 3: Test performance in terms of Hits@20 (%) and training loss comparison on the Amazon-French dataset.
Figure 4: An Overview of AlterRec. (a), (b): two key components – the ID and text uni-modal networks. (c): These networks are trained alternately, learning from each other through predictions generated by the other network.
Figure 5: Test performance across each epoch during the alternative training.
...and 4 more figures

Enhancing ID and Text Fusion via Alternative Training in Session-based Recommendation

TL;DR

Abstract

Enhancing ID and Text Fusion via Alternative Training in Session-based Recommendation

Authors

TL;DR

Abstract

Table of Contents

Figures (9)