Table of Contents
Fetching ...

Real-time Indexing for Large-scale Recommendation by Streaming Vector Quantization Retriever

Xingyan Bin, Jianfei Cui, Wujie Yan, Zhichen Zhao, Xintian Han, Chongyang Yan, Feng Zhang, Xun Zhou, Qi Wu, Zuotao Liu

TL;DR

This paper addresses the bottleneck of real-time, scalable retrieval in large-scale recommender systems by introducing Streaming Vector Quantization (Streaming VQ), an index that attaches items to clusters in real time to capture emergent trends. It emphasizes index immediacy, reparability without full reconstruction, and index balancing, while maintaining compatibility with sophisticated ranking models through mechanisms like merge sort serving and two-tower foundations. Multi-task extensions and empirical results on Douyin and Douyin Lite show substantial improvements in key engagement metrics, surpassing traditional indexes like HNSW and DR. The work argues that indexing quality and real-time adaptability are as crucial as model complexity, offering a practical and scalable paradigm for industrial recommendation systems.

Abstract

Retrievers, which form one of the most important recommendation stages, are responsible for efficiently selecting possible positive samples to the later stages under strict latency limitations. Because of this, large-scale systems always rely on approximate calculations and indexes to roughly shrink candidate scale, with a simple ranking model. Considering simple models lack the ability to produce precise predictions, most of the existing methods mainly focus on incorporating complicated ranking models. However, another fundamental problem of index effectiveness remains unresolved, which also bottlenecks complication. In this paper, we propose a novel index structure: streaming Vector Quantization model, as a new generation of retrieval paradigm. Streaming VQ attaches items with indexes in real time, granting it immediacy. Moreover, through meticulous verification of possible variants, it achieves additional benefits like index balancing and reparability, enabling it to support complicated ranking models as existing approaches. As a lightweight and implementation-friendly architecture, streaming VQ has been deployed and replaced all major retrievers in Douyin and Douyin Lite, resulting in remarkable user engagement gain.

Real-time Indexing for Large-scale Recommendation by Streaming Vector Quantization Retriever

TL;DR

This paper addresses the bottleneck of real-time, scalable retrieval in large-scale recommender systems by introducing Streaming Vector Quantization (Streaming VQ), an index that attaches items to clusters in real time to capture emergent trends. It emphasizes index immediacy, reparability without full reconstruction, and index balancing, while maintaining compatibility with sophisticated ranking models through mechanisms like merge sort serving and two-tower foundations. Multi-task extensions and empirical results on Douyin and Douyin Lite show substantial improvements in key engagement metrics, surpassing traditional indexes like HNSW and DR. The work argues that indexing quality and real-time adaptability are as crucial as model complexity, offering a practical and scalable paradigm for industrial recommendation systems.

Abstract

Retrievers, which form one of the most important recommendation stages, are responsible for efficiently selecting possible positive samples to the later stages under strict latency limitations. Because of this, large-scale systems always rely on approximate calculations and indexes to roughly shrink candidate scale, with a simple ranking model. Considering simple models lack the ability to produce precise predictions, most of the existing methods mainly focus on incorporating complicated ranking models. However, another fundamental problem of index effectiveness remains unresolved, which also bottlenecks complication. In this paper, we propose a novel index structure: streaming Vector Quantization model, as a new generation of retrieval paradigm. Streaming VQ attaches items with indexes in real time, granting it immediacy. Moreover, through meticulous verification of possible variants, it achieves additional benefits like index balancing and reparability, enabling it to support complicated ranking models as existing approaches. As a lightweight and implementation-friendly architecture, streaming VQ has been deployed and replaced all major retrievers in Douyin and Douyin Lite, resulting in remarkable user engagement gain.
Paper Structure (20 sections, 13 equations, 6 figures, 3 tables, 1 algorithm)

This paper contains 20 sections, 13 equations, 6 figures, 3 tables, 1 algorithm.

Figures (6)

  • Figure 1: The training framework of the proposed streaming VQ model.
  • Figure 2: The merge sort solution to finely rank items. The clusters are selected by personality, while popularity can be used to rank items within clusters. Here we visualize the situation where chunk size is 1.
  • Figure 3: The two architectures of ranking step model. Here blue/yellow/green blocks denote item-side/user-side/cross features, respectively. Complicated architecture fuses user and item side features earlier, thus provides better performance. Note it would also cost more computational overheads.
  • Figure 4: Cluster distributions of streaming VQ.
  • Figure 5: Detailed distribution of impressions. All bars denote relative difference compared with HNSW Two-tower.
  • ...and 1 more figures