Table of Contents
Fetching ...

MARec: Metadata Alignment for cold-start Recommendation

Julien Monteil, Volodymyr Vaskovych, Wentao Lu, Anirban Majumder, Anton van den Hengel

TL;DR

MARec tackles the cold-start problem by aligning metadata-derived item similarities with click-based similarities through a regularized objective that fuses a backbone recommender with metadata embeddings. The framework supports multiple backbones (e.g., Variational Autoencoders, EASE, Modified SLIM) and introduces an alignment model that leverages smoothed cosine similarities across metadata subspaces, weighted to emphasize cold items. Empirically, MARec achieves state-of-the-art gains on four cold-start datasets, with semantic features such as LLM embeddings, images, and tags delivering substantial improvements. It also remains competitive in warm-start regimes and offers closed-form solutions in several configurations, making it practical for deployment. Overall, the approach provides a scalable, flexible path to robust cold-start recommendations while maintaining strong performance as data warms up.”

Abstract

For many recommender systems, the primary data source is a historical record of user clicks. The associated click matrix is often very sparse, as the number of users x products can be far larger than the number of clicks. Such sparsity is accentuated in cold-start settings, which makes the efficient use of metadata information of paramount importance. In this work, we propose a simple approach to address cold-start recommendations by leveraging content metadata, Metadata Alignment for cold-start Recommendation. We show that this approach can readily augment existing matrix factorization and autoencoder approaches, enabling a smooth transition to top performing algorithms in warmer set-ups. Our experimental results indicate three separate contributions: first, we show that our proposed framework largely beats SOTA results on 4 cold-start datasets with different sparsity and scale characteristics, with gains ranging from +8.4% to +53.8% on reported ranking metrics; second, we provide an ablation study on the utility of semantic features, and proves the additional gain obtained by leveraging such features ranges between +46.8% and +105.5%; and third, our approach is by construction highly competitive in warm set-ups, and we propose a closed-form solution outperformed by SOTA results by only 0.8% on average.

MARec: Metadata Alignment for cold-start Recommendation

TL;DR

MARec tackles the cold-start problem by aligning metadata-derived item similarities with click-based similarities through a regularized objective that fuses a backbone recommender with metadata embeddings. The framework supports multiple backbones (e.g., Variational Autoencoders, EASE, Modified SLIM) and introduces an alignment model that leverages smoothed cosine similarities across metadata subspaces, weighted to emphasize cold items. Empirically, MARec achieves state-of-the-art gains on four cold-start datasets, with semantic features such as LLM embeddings, images, and tags delivering substantial improvements. It also remains competitive in warm-start regimes and offers closed-form solutions in several configurations, making it practical for deployment. Overall, the approach provides a scalable, flexible path to robust cold-start recommendations while maintaining strong performance as data warms up.”

Abstract

For many recommender systems, the primary data source is a historical record of user clicks. The associated click matrix is often very sparse, as the number of users x products can be far larger than the number of clicks. Such sparsity is accentuated in cold-start settings, which makes the efficient use of metadata information of paramount importance. In this work, we propose a simple approach to address cold-start recommendations by leveraging content metadata, Metadata Alignment for cold-start Recommendation. We show that this approach can readily augment existing matrix factorization and autoencoder approaches, enabling a smooth transition to top performing algorithms in warmer set-ups. Our experimental results indicate three separate contributions: first, we show that our proposed framework largely beats SOTA results on 4 cold-start datasets with different sparsity and scale characteristics, with gains ranging from +8.4% to +53.8% on reported ranking metrics; second, we provide an ablation study on the utility of semantic features, and proves the additional gain obtained by leveraging such features ranges between +46.8% and +105.5%; and third, our approach is by construction highly competitive in warm set-ups, and we propose a closed-form solution outperformed by SOTA results by only 0.8% on average.
Paper Structure (26 sections, 13 equations, 1 figure, 8 tables)

This paper contains 26 sections, 13 equations, 1 figure, 8 tables.

Figures (1)

  • Figure 1: Simplified architecture of MARec. The embedding model $f^{\text{E}}_{\phi}$ embeds each item metadata in a joint embedding space, and the alignment function $f^{\text{A}}_{\zeta}$ retrieves and align the similar items to the similarity space provided by the backbone model $f^{\text{B}}_{\theta}$. The architecture is trained with a reconstruction loss on the click history.