Table of Contents
Fetching ...

Collaborative Semantic Alignment in Recommendation Systems

Chen Wang, Liangwei Yang, Zhiwei Liu, Xiaolong Liu, Mingdai Yang, Yueqing Liang, Philip S. Yu

TL;DR

CARec tackles the gap between collaborative filtering and semantic item representations by introducing Reciprocal Alignment, a two-phase training paradigm that first aligns user embeddings to item semantics and then refines item representations with an adaptor to preserve semantic information. Items act as teachers and users as learners in the semantic aligning phase, after which users switch roles to teach items during the collaborative refining phase, using an MLP adaptor to inject collaborative signals while keeping semantic semantics intact. Empirical results on four real-world datasets show CARec achieves state-of-the-art performance in both warm and cold-start settings, outperforming ID-based and text-based baselines and enabling effective cold-item recommendations without extra modules. A case study and extensive ablations demonstrate that maintaining item semantic integrity while incorporating collaborative signals is key to CARec’s success, with instructor-xl often delivering the strongest semantic embeddings. The work highlights practical impact for robust recommendations in dynamic inventory and sparse-interaction domains, and points to future directions in multi-domain semantic preservation and more nuanced role-switch indicators.

Abstract

Traditional recommender systems primarily leverage identity-based (ID) representations for users and items, while the advent of pre-trained language models (PLMs) has introduced rich semantic modeling of item descriptions. However, PLMs often overlook the vital collaborative filtering signals, leading to challenges in merging collaborative and semantic representation spaces and fine-tuning semantic representations for better alignment with warm-start conditions. Our work introduces CARec, a cutting-edge model that integrates collaborative filtering with semantic representations, ensuring the alignment of these representations within the semantic space while retaining key semantics. Our experiments across four real-world datasets show significant performance improvements. CARec's collaborative alignment approach also extends its applicability to cold-start scenarios, where it demonstrates notable enhancements in recommendation accuracy. The code will be available upon paper acceptance.

Collaborative Semantic Alignment in Recommendation Systems

TL;DR

CARec tackles the gap between collaborative filtering and semantic item representations by introducing Reciprocal Alignment, a two-phase training paradigm that first aligns user embeddings to item semantics and then refines item representations with an adaptor to preserve semantic information. Items act as teachers and users as learners in the semantic aligning phase, after which users switch roles to teach items during the collaborative refining phase, using an MLP adaptor to inject collaborative signals while keeping semantic semantics intact. Empirical results on four real-world datasets show CARec achieves state-of-the-art performance in both warm and cold-start settings, outperforming ID-based and text-based baselines and enabling effective cold-item recommendations without extra modules. A case study and extensive ablations demonstrate that maintaining item semantic integrity while incorporating collaborative signals is key to CARec’s success, with instructor-xl often delivering the strongest semantic embeddings. The work highlights practical impact for robust recommendations in dynamic inventory and sparse-interaction domains, and points to future directions in multi-domain semantic preservation and more nuanced role-switch indicators.

Abstract

Traditional recommender systems primarily leverage identity-based (ID) representations for users and items, while the advent of pre-trained language models (PLMs) has introduced rich semantic modeling of item descriptions. However, PLMs often overlook the vital collaborative filtering signals, leading to challenges in merging collaborative and semantic representation spaces and fine-tuning semantic representations for better alignment with warm-start conditions. Our work introduces CARec, a cutting-edge model that integrates collaborative filtering with semantic representations, ensuring the alignment of these representations within the semantic space while retaining key semantics. Our experiments across four real-world datasets show significant performance improvements. CARec's collaborative alignment approach also extends its applicability to cold-start scenarios, where it demonstrates notable enhancements in recommendation accuracy. The code will be available upon paper acceptance.
Paper Structure (33 sections, 12 equations, 5 figures, 5 tables)

This paper contains 33 sections, 12 equations, 5 figures, 5 tables.

Figures (5)

  • Figure 1: CARec comprises three key phases: the semantic aligning phase, the collaborative refining phase, and inference phase. During the semantic aligning phase, the model aligns user representations with the item semantic representation space. In contrast, the collaborative refining phase focuses on guiding item representations to effectively incorporate collaborative signals while preserving their semantic characteristics. Finally, in the inference Phase, the model leverages the acquired knowledge to provide personalized recommendations by utilizing the learned user embeddings and transformed item embeddings.
  • Figure 2: Ablation study of CARec on Electronic
  • Figure 3: Overall performance in each training phase
  • Figure 4: Parameter analysis of MLP on Electronic
  • Figure 5: Comparison of representation space after model alignment. The left figure illustrates the representation space following Collaborative Filtering Alignment, while the right figure depicts the representation space after Collaborative alignment Alignment. In both figures, the blue nodes symbolize item semantic representations, the purple nodes represent item mapped representations by MLP, and the green nodes denote user learned representations.