Table of Contents
Fetching ...

KuaiFormer: Transformer-Based Retrieval at Kuaishou

Chi Liu, Jiangxia Cao, Rui Huang, Kai Zheng, Qiang Luo, Kun Gai, Guorui Zhou

TL;DR

KuaiFormer fundamentally redefines the retrieval process by shifting from conventional score estimation tasks to a transformer-driven Next Action Prediction paradigm, which enables more effective real-time interest acquisition and multi-interest extraction, significantly enhancing retrieval performance.

Abstract

In large-scale content recommendation systems, retrieval serves as the initial stage in the pipeline, responsible for selecting thousands of candidate items from billions of options to pass on to ranking modules. Traditionally, the dominant retrieval method has been Embedding-Based Retrieval (EBR) using a Deep Neural Network (DNN) dual-tower structure. However, applying transformer in retrieval tasks has been the focus of recent research, though real-world industrial deployment still presents significant challenges. In this paper, we introduce KuaiFormer, a novel transformer-based retrieval framework deployed in a large-scale content recommendation system. KuaiFormer fundamentally redefines the retrieval process by shifting from conventional score estimation tasks (such as click-through rate estimate) to a transformer-driven Next Action Prediction paradigm. This shift enables more effective real-time interest acquisition and multi-interest extraction, significantly enhancing retrieval performance. KuaiFormer has been successfully integrated into Kuaishou App's short-video recommendation system since May 2024, serving over 400 million daily active users and resulting in a marked increase in average daily usage time of Kuaishou users. We provide insights into both the technical and business aspects of deploying transformer in large-scale recommendation systems, addressing practical challenges encountered during industrial implementation. Our findings offer valuable guidance for engineers and researchers aiming to leverage transformer models to optimize large-scale content recommendation systems.

KuaiFormer: Transformer-Based Retrieval at Kuaishou

TL;DR

KuaiFormer fundamentally redefines the retrieval process by shifting from conventional score estimation tasks to a transformer-driven Next Action Prediction paradigm, which enables more effective real-time interest acquisition and multi-interest extraction, significantly enhancing retrieval performance.

Abstract

In large-scale content recommendation systems, retrieval serves as the initial stage in the pipeline, responsible for selecting thousands of candidate items from billions of options to pass on to ranking modules. Traditionally, the dominant retrieval method has been Embedding-Based Retrieval (EBR) using a Deep Neural Network (DNN) dual-tower structure. However, applying transformer in retrieval tasks has been the focus of recent research, though real-world industrial deployment still presents significant challenges. In this paper, we introduce KuaiFormer, a novel transformer-based retrieval framework deployed in a large-scale content recommendation system. KuaiFormer fundamentally redefines the retrieval process by shifting from conventional score estimation tasks (such as click-through rate estimate) to a transformer-driven Next Action Prediction paradigm. This shift enables more effective real-time interest acquisition and multi-interest extraction, significantly enhancing retrieval performance. KuaiFormer has been successfully integrated into Kuaishou App's short-video recommendation system since May 2024, serving over 400 million daily active users and resulting in a marked increase in average daily usage time of Kuaishou users. We provide insights into both the technical and business aspects of deploying transformer in large-scale recommendation systems, addressing practical challenges encountered during industrial implementation. Our findings offer valuable guidance for engineers and researchers aiming to leverage transformer models to optimize large-scale content recommendation systems.

Paper Structure

This paper contains 26 sections, 9 equations, 3 figures, 3 tables.

Figures (3)

  • Figure 1: KuaiFormer architecture under the length setting 256, and 4 query tokens, where the $\texttt{t}^{\texttt{early}}_{1}$ denotes the early item compression, the $\texttt{t}^{\texttt{mid}}_{1}$ denotes the middle item compression. We can effectively model a longer sequence of 256 through the use of feeding a shorter sequence of 55 for efficient training and inference.
  • Figure 2: Deployment Architecture
  • Figure 3: Impact of Sequence Length, Query token number and Layer number on Accuracy