Table of Contents
Fetching ...

Embedding in Recommender Systems: A Survey

Maolin Wang, Xinjian Zhao, Wanyu Wang, Sheng Zhang, Jiansheng Li, Bowen Yu, Binhao Wang, Shucheng Zhou, Dawei Yin, Qing Li, Ruocheng Guo, Xiangyu Zhao

TL;DR

The survey addresses embedding in recommender systems across matrix, sequential, and graph modalities, and surveys scalability-focused approaches including AutoML, hashing, quantization, and LLM-driven enhancements. It synthesizes traditional matrix factorization and factorization machine techniques with modern sequential encoders and graph representation learning, including self-supervised and predictive methods. A central theme is extending embeddings with Large Language Models to provide semantic information and guidance, while confronting practical constraints like efficiency, fairness, and privacy. The work highlights open challenges and outlines directions to unify embeddings across data modalities and graph structures for scalable, semantically rich recommendations.

Abstract

Recommender systems have become an essential component of many online platforms, providing personalized recommendations to users. A crucial aspect is embedding techniques that convert the high-dimensional discrete features, such as user and item IDs, into low-dimensional continuous vectors, which can enhance the recommendation performance. Embedding techniques have revolutionized the capture of complex entity relationships, generating significant research interest. This survey presents a comprehensive analysis of recent advances in recommender system embedding techniques. We examine centralized embedding approaches across matrix, sequential, and graph structures. In matrix-based scenarios, collaborative filtering generates embeddings that effectively model user-item preferences, particularly in sparse data environments. For sequential data, we explore various approaches including recurrent neural networks and self-supervised methods such as contrastive and generative learning. In graph-structured contexts, we analyze techniques like node2vec that leverage network relationships, along with applicable self-supervised methods. Our survey addresses critical scalability challenges in embedding methods and explores innovative directions in recommender systems. We introduce emerging approaches, including AutoML, hashing techniques, and quantization methods, to enhance performance while reducing computational complexity. Additionally, we examine the promising role of Large Language Models (LLMs) in embedding enhancement. Through detailed discussion of various architectures and methodologies, this survey aims to provide a thorough overview of state-of-the-art embedding techniques in recommender systems, while highlighting key challenges and future research directions.

Embedding in Recommender Systems: A Survey

TL;DR

The survey addresses embedding in recommender systems across matrix, sequential, and graph modalities, and surveys scalability-focused approaches including AutoML, hashing, quantization, and LLM-driven enhancements. It synthesizes traditional matrix factorization and factorization machine techniques with modern sequential encoders and graph representation learning, including self-supervised and predictive methods. A central theme is extending embeddings with Large Language Models to provide semantic information and guidance, while confronting practical constraints like efficiency, fairness, and privacy. The work highlights open challenges and outlines directions to unify embeddings across data modalities and graph structures for scalable, semantically rich recommendations.

Abstract

Recommender systems have become an essential component of many online platforms, providing personalized recommendations to users. A crucial aspect is embedding techniques that convert the high-dimensional discrete features, such as user and item IDs, into low-dimensional continuous vectors, which can enhance the recommendation performance. Embedding techniques have revolutionized the capture of complex entity relationships, generating significant research interest. This survey presents a comprehensive analysis of recent advances in recommender system embedding techniques. We examine centralized embedding approaches across matrix, sequential, and graph structures. In matrix-based scenarios, collaborative filtering generates embeddings that effectively model user-item preferences, particularly in sparse data environments. For sequential data, we explore various approaches including recurrent neural networks and self-supervised methods such as contrastive and generative learning. In graph-structured contexts, we analyze techniques like node2vec that leverage network relationships, along with applicable self-supervised methods. Our survey addresses critical scalability challenges in embedding methods and explores innovative directions in recommender systems. We introduce emerging approaches, including AutoML, hashing techniques, and quantization methods, to enhance performance while reducing computational complexity. Additionally, we examine the promising role of Large Language Models (LLMs) in embedding enhancement. Through detailed discussion of various architectures and methodologies, this survey aims to provide a thorough overview of state-of-the-art embedding techniques in recommender systems, while highlighting key challenges and future research directions.
Paper Structure (43 sections, 26 equations, 8 figures, 8 tables)

This paper contains 43 sections, 26 equations, 8 figures, 8 tables.

Figures (8)

  • Figure 1: FunkSVD model funk2006netflix. It involves an approximate factorization on matrix $\mathbf{R}\in\mathbb{R}^{{m \times n}}$ to obtain smaller user matrix $\mathbf{U}\in\mathbb{R}^{{m \times d}}$ and item matrix $\mathbf{V}\in\mathbb{R}^{{n \times d}}$, with a hidden embedding dimension $d$.
  • Figure 2: Augmentation methods of Sequence. (a) Crop items for contrastive learning. (b) Mask item elements to prevent overfitting. (c) Augment data by reordering items, enhancing recommenders on varying sequences. (d) Substitute highly correlated yet possibly redundant items. (e) Add related items before the original item.
  • Figure 3: Graph types in the recommendation system. (a) Homogeneous Graph: all nodes are of the same type. (b) Bipartite Graph: A unique structure with two node sets, where edges connect nodes from different sets. (c) Heterogeneous Graph: Encompasses diverse node categories that can be interconnected. (d) Hypergraph: In a hypergraph, each hyperedge can connect arbitrarily many nodes.
  • Figure 4: Augmentation methods of Graph. Drop (a) interaction edges or (b) user/item nodes to identify influential components. (c) Introducing user-item similarity through new edges. (d) Sample local nodes and edges in the subgraph to accentuate connectivity.
  • Figure 5: The mechanism and challenge of hash embeddings. (Left) The "hashing trick" maps high-cardinality IDs to a smaller embedding table. (Top-Right) A key challenge is hash collision, where different IDs (e.g., 897 and 2045) collapse to the same index, causing representation ambiguity. (Bottom-Right) Multi-hash methods provide a solution by using multiple hash functions to ensure unique representations.
  • ...and 3 more figures