Table of Contents
Fetching ...

Rethinking ANN-based Retrieval: Multifaceted Learnable Index for Large-scale Recommendation System

Jiang Zhang, Yubo Wang, Wei Chang, Lu Han, Xingying Cheng, Feng Zhang, Min Li, Songhao Jiang, Wei Zheng, Harry Tran, Zhen Wang, Lei Chen, Yueming Wang, Benyu Zhang, Xiangjun Fan, Bi Xue, Qifan Wang

TL;DR

MultiFaceted Learnable Index (MFLI), a scalable, real-time retrieval paradigm that learns multifaceted item embeddings and indices within a unified framework and eliminates ANN search at serving time is proposed.

Abstract

Approximate nearest neighbor (ANN) search is widely used in the retrieval stage of large-scale recommendation systems. In this stage, candidate items are indexed using their learned embedding vectors, and ANN search is executed for each user (or item) query to retrieve a set of relevant items. However, ANN-based retrieval has two key limitations. First, item embeddings and their indices are typically learned in separate stages: indexing is often performed offline after embeddings are trained, which can yield suboptimal retrieval quality-especially for newly created items. Second, although ANN offers sublinear query time, it must still be run for every request, incurring substantial computation cost at industry scale. In this paper, we propose MultiFaceted Learnable Index (MFLI), a scalable, real-time retrieval paradigm that learns multifaceted item embeddings and indices within a unified framework and eliminates ANN search at serving time. Specifically, we construct a multifaceted hierarchical codebook via residual quantization of item embeddings and co-train the codebook with the embeddings. We further introduce an efficient multifaceted indexing structure and mechanisms that support real-time updates. At serving time, the learned hierarchical indices are used directly to identify relevant items, avoiding ANN search altogether. Extensive experiments on real-world data with billions of users show that MFLI improves recall on engagement tasks by up to 11.8\%, cold-content delivery by up to 57.29\%, and semantic relevance by 13.5\% compared with prior state-of-the-art methods. We also deploy MFLI in the system and report online experimental results demonstrating improved engagement, less popularity bias, and higher serving efficiency.

Rethinking ANN-based Retrieval: Multifaceted Learnable Index for Large-scale Recommendation System

TL;DR

MultiFaceted Learnable Index (MFLI), a scalable, real-time retrieval paradigm that learns multifaceted item embeddings and indices within a unified framework and eliminates ANN search at serving time is proposed.

Abstract

Approximate nearest neighbor (ANN) search is widely used in the retrieval stage of large-scale recommendation systems. In this stage, candidate items are indexed using their learned embedding vectors, and ANN search is executed for each user (or item) query to retrieve a set of relevant items. However, ANN-based retrieval has two key limitations. First, item embeddings and their indices are typically learned in separate stages: indexing is often performed offline after embeddings are trained, which can yield suboptimal retrieval quality-especially for newly created items. Second, although ANN offers sublinear query time, it must still be run for every request, incurring substantial computation cost at industry scale. In this paper, we propose MultiFaceted Learnable Index (MFLI), a scalable, real-time retrieval paradigm that learns multifaceted item embeddings and indices within a unified framework and eliminates ANN search at serving time. Specifically, we construct a multifaceted hierarchical codebook via residual quantization of item embeddings and co-train the codebook with the embeddings. We further introduce an efficient multifaceted indexing structure and mechanisms that support real-time updates. At serving time, the learned hierarchical indices are used directly to identify relevant items, avoiding ANN search altogether. Extensive experiments on real-world data with billions of users show that MFLI improves recall on engagement tasks by up to 11.8\%, cold-content delivery by up to 57.29\%, and semantic relevance by 13.5\% compared with prior state-of-the-art methods. We also deploy MFLI in the system and report online experimental results demonstrating improved engagement, less popularity bias, and higher serving efficiency.
Paper Structure (21 sections, 14 equations, 5 figures, 7 tables)

This paper contains 21 sections, 14 equations, 5 figures, 7 tables.

Figures (5)

  • Figure 1: Traditional ANN-based retrieval contains three stages: item embedding learning, offline item indexing, and online ANN search at serving time. In contrast, LI-based retrieval jointly learns item embeddings and indices during training, and performs fast index lookup during serving.
  • Figure 2: Overview of MFLI framework. During training phase, it learns a multifaceted embedding for each item, and conducts residual quantization on candidate item embeddings by facet in parallel. The codebook and item embeddings are jointly trained via a multifacted loss and a quantized multifaceted loss. The serving part consists of four modules: 1) index lookup, which looks up the multifaceted indices of query item ids; 2) index selection module that selects top $K$ indices; 3) item lookup, which fetches all items in each selected index; 4) per-index rerank, which selects top $N$ items for each selected index.
  • Figure 3: Illustration of index rebalancing, where undersized indices are merged, oversized indices are split, and mask items are assigned with an invalid index.
  • Figure 4: Delta update of MFLI’s indexing structure. Every $\Delta t$ time (e.g., 1 min), a delta snapshot storing the indexing structure of fresh items is produced. Every $\Delta T$ time (e.g., 30 minutes), a full snapshot storing the refreshed indexing structure of the full item pool is produced. MFLI fetches items from both the full and delta snapshots during serving.
  • Figure 5: Analysis of learnt indices in MFLI. (a) Index-size distribution (items per index). (b) Item distribution across index-size buckets (items per index-size buckets). (c) Index usage (percentage of non-empty indices) over training snapshots. (d) Intra-index vs. inter-index relevance.