CoLLM: Integrating Collaborative Embeddings into Large Language Models for Recommendation
Yang Zhang, Fuli Feng, Jizhi Zhang, Keqin Bao, Qifan Wang, Xiangnan He
TL;DR
CoLLM tackles the shortcoming of LLM-based recommenders that rely mainly on text semantics by externally modeling collaborative information. It introduces the Collaborative Information Encoding (CIE) module to map latent user/item representations from traditional CF models into the LLM embedding space, and uses a LoRA-based predictor with a two-step tuning process. Experiments on MovieLens-1M and Amazon-Book show strong gains in both warm and cold start settings, and the approach generalizes across LLM backbones. The work demonstrates a flexible, scalable way to fuse collaborative signals with LLM reasoning for improved recommendations.
Abstract
Leveraging Large Language Models as Recommenders (LLMRec) has gained significant attention and introduced fresh perspectives in user preference modeling. Existing LLMRec approaches prioritize text semantics, usually neglecting the valuable collaborative information from user-item interactions in recommendations. While these text-emphasizing approaches excel in cold-start scenarios, they may yield sub-optimal performance in warm-start situations. In pursuit of superior recommendations for both cold and warm start scenarios, we introduce CoLLM, an innovative LLMRec methodology that seamlessly incorporates collaborative information into LLMs for recommendation. CoLLM captures collaborative information through an external traditional model and maps it to the input token embedding space of LLM, forming collaborative embeddings for LLM usage. Through this external integration of collaborative information, CoLLM ensures effective modeling of collaborative information without modifying the LLM itself, providing the flexibility to employ various collaborative information modeling techniques. Extensive experiments validate that CoLLM adeptly integrates collaborative information into LLMs, resulting in enhanced recommendation performance. We release the code and data at https://github.com/zyang1580/CoLLM.
