General Item Representation Learning for Cold-start Content Recommendations
Jooeun Kim, Jinri Kim, Kwangeun Yeo, Eungi Kim, Kyoung-Woon On, Jonghwan Mun, Joonseok Lee
TL;DR
The paper tackles cold-start item recommendation by leveraging rich multimodal content signals rather than relying on user-item interactions alone. It introduces a domain/dataset-agnostic item content representation framework built on Transformer-based modality-specific encoders with flexible fusion strategies, trained end-to-end on user activity data only. Two training objectives are proposed: a rating ranking loss and an optional multimodal alignment loss to harmonize content modalities. Empirical results on movie and news benchmarks demonstrate state-of-the-art cold-start performance and good transferability across domains, while reducing dependence on large labeled classification data. Overall, the approach yields fine-grained item representations that better capture user tastes and support scalable deployment.
Abstract
Cold-start item recommendation is a long-standing challenge in recommendation systems. A common remedy is to use a content-based approach, but rich information from raw contents in various forms has not been fully utilized. In this paper, we propose a domain/data-agnostic item representation learning framework for cold-start recommendations, naturally equipped with multimodal alignment among various features by adopting a Transformer-based architecture. Our proposed model is end-to-end trainable completely free from classification labels, not just costly to collect but suboptimal for recommendation-purpose representation learning. From extensive experiments on real-world movie and news recommendation benchmarks, we verify that our approach better preserves fine-grained user taste than state-of-the-art baselines, universally applicable to multiple domains at large scale.
