Table of Contents
Fetching ...

Breaking the Clusters: Uniformity-Optimization for Text-Based Sequential Recommendation

Wuhan Chen, Zongwei Wang, Min Gao, Xin Xia, Feng Jiang, Junhao Wen

TL;DR

This work addresses non-uniformity in text-based sequential recommendation, where text-derived item embeddings cluster due to semantic similarities. It introduces UniT, a framework that freezes a text encoder and applies three sampling-based uniformity strategies—Unified General Sampling, Sequence-Driven Sampling, and Popularity-Driven Sampling—coupled with a joint loss $L = L_{Rec} + \gamma L_U$ to encourage dispersion among item representations without sacrificing personalization. Empirical results across Office, Music, and ML-1M datasets show consistent gains in HR@20 and NDCG@20, driven by improved representation uniformity and targeted handling of sequence context and item popularity. The method demonstrates practical potential for robust, domain-agnostic text-based SR, with code available for replication.

Abstract

Traditional sequential recommendation (SR) methods heavily rely on explicit item IDs to capture user preferences over time. This reliance introduces critical limitations in cold-start scenarios and domain transfer tasks, where unseen items and new contexts often lack established ID mappings. To overcome these limitations, recent studies have shifted towards leveraging text-only information for recommendation, thereby improving model generalization and adaptability across domains. Although promising, text-based SR faces unique difficulties: items' text descriptions often share semantic similarities that lead to clustered item representations, compromising their uniformity, a property essential for promoting diversity and enhancing generalization in recommendation systems. In this paper, we explore a novel framework to improve the uniformity of item representations in text-based SR. Our analysis reveals that items within a sequence exhibit marked semantic similarity, meaning they are closer in representation than items overall, and that this effect is more pronounced for less popular items, which form tighter clusters compared to their more popular counterparts. Based on these findings, we propose UniT, a framework that employs three pairwise item sampling strategies: Unified General Sampling Strategy, Sequence-Driven Sampling Strategy, and Popularity-Driven Sampling Strategy. Each strategy applies varying degrees of repulsion to selectively adjust the distances between item pairs, thereby refining representation uniformity while considering both sequence context and item popularity. Extensive experiments on multiple real-world datasets demonstrate that our proposed approach outperforms state-of-the-art models, validating the effectiveness of UniT in enhancing both representation uniformity and recommendation accuracy.The source code is available at https://github.com/ccwwhhh/Model-Rec.

Breaking the Clusters: Uniformity-Optimization for Text-Based Sequential Recommendation

TL;DR

This work addresses non-uniformity in text-based sequential recommendation, where text-derived item embeddings cluster due to semantic similarities. It introduces UniT, a framework that freezes a text encoder and applies three sampling-based uniformity strategies—Unified General Sampling, Sequence-Driven Sampling, and Popularity-Driven Sampling—coupled with a joint loss to encourage dispersion among item representations without sacrificing personalization. Empirical results across Office, Music, and ML-1M datasets show consistent gains in HR@20 and NDCG@20, driven by improved representation uniformity and targeted handling of sequence context and item popularity. The method demonstrates practical potential for robust, domain-agnostic text-based SR, with code available for replication.

Abstract

Traditional sequential recommendation (SR) methods heavily rely on explicit item IDs to capture user preferences over time. This reliance introduces critical limitations in cold-start scenarios and domain transfer tasks, where unseen items and new contexts often lack established ID mappings. To overcome these limitations, recent studies have shifted towards leveraging text-only information for recommendation, thereby improving model generalization and adaptability across domains. Although promising, text-based SR faces unique difficulties: items' text descriptions often share semantic similarities that lead to clustered item representations, compromising their uniformity, a property essential for promoting diversity and enhancing generalization in recommendation systems. In this paper, we explore a novel framework to improve the uniformity of item representations in text-based SR. Our analysis reveals that items within a sequence exhibit marked semantic similarity, meaning they are closer in representation than items overall, and that this effect is more pronounced for less popular items, which form tighter clusters compared to their more popular counterparts. Based on these findings, we propose UniT, a framework that employs three pairwise item sampling strategies: Unified General Sampling Strategy, Sequence-Driven Sampling Strategy, and Popularity-Driven Sampling Strategy. Each strategy applies varying degrees of repulsion to selectively adjust the distances between item pairs, thereby refining representation uniformity while considering both sequence context and item popularity. Extensive experiments on multiple real-world datasets demonstrate that our proposed approach outperforms state-of-the-art models, validating the effectiveness of UniT in enhancing both representation uniformity and recommendation accuracy.The source code is available at https://github.com/ccwwhhh/Model-Rec.

Paper Structure

This paper contains 21 sections, 11 equations, 7 figures, 2 tables.

Figures (7)

  • Figure 1: The comparison of ID-based SR and text-based SR.
  • Figure 2: Distribution of item representations learned from the dataset of Amazon-Music.
  • Figure 3: An overview of the proposed workflow pipeline for UniT.
  • Figure 4: Sampling strategy that assigns different importance to items. Unified General Sampling Strategy weakens the uniforming operation within the sequence. Popularity-Driven Sampling Strategy incorporates each item's popularity metric into the calculation.
  • Figure 5: The interaction item sequences of two real users.
  • ...and 2 more figures