Table of Contents
Fetching ...

Large Memory Network for Recommendation

Hui Lu, Zheng Chai, Yuchao Zheng, Zhe Chen, Deping Xie, Peng Xu, Xun Zhou, Di Wu

TL;DR

The paper addresses the challenge of modeling long-horizon user interests in recommendation by introducing Large Memory Network (LMN), a memory-augmented framework that compresses user history into a global, million-scale memory block. LMN performs memory extraction with a two-axis activation and user-aware queries, aided by product-quantization to reduce computational costs, and injects learned information through a memory-loss term, enabling joint training with CTR objectives. Offline results on Douyin ECS show LMN outperforming baselines in AUC and LogLoss, with performance improving as memory size increases, and online A/B tests indicate modest latency impact with meaningful gains in orders per user and per search. The work demonstrates a practical path to industrial-scale memory-augmented recommendation, validated by real-world deployment and suggesting future directions to memorize other high-signal components beyond historical interactions.

Abstract

Modeling user behavior sequences in recommender systems is essential for understanding user preferences over time, enabling personalized and accurate recommendations for improving user retention and enhancing business values. Despite its significance, there are two challenges for current sequential modeling approaches. From the spatial dimension, it is difficult to mutually perceive similar users' interests for a generalized intention understanding; from the temporal dimension, current methods are generally prone to forgetting long-term interests due to the fixed-length input sequence. In this paper, we present Large Memory Network (LMN), providing a novel idea by compressing and storing user history behavior information in a large-scale memory block. With the elaborated online deployment strategy, the memory block can be easily scaled up to million-scale in the industry. Extensive offline comparison experiments, memory scaling up experiments, and online A/B test on Douyin E-Commerce Search (ECS) are performed, validating the superior performance of LMN. Currently, LMN has been fully deployed in Douyin ECS, serving millions of users each day.

Large Memory Network for Recommendation

TL;DR

The paper addresses the challenge of modeling long-horizon user interests in recommendation by introducing Large Memory Network (LMN), a memory-augmented framework that compresses user history into a global, million-scale memory block. LMN performs memory extraction with a two-axis activation and user-aware queries, aided by product-quantization to reduce computational costs, and injects learned information through a memory-loss term, enabling joint training with CTR objectives. Offline results on Douyin ECS show LMN outperforming baselines in AUC and LogLoss, with performance improving as memory size increases, and online A/B tests indicate modest latency impact with meaningful gains in orders per user and per search. The work demonstrates a practical path to industrial-scale memory-augmented recommendation, validated by real-world deployment and suggesting future directions to memorize other high-signal components beyond historical interactions.

Abstract

Modeling user behavior sequences in recommender systems is essential for understanding user preferences over time, enabling personalized and accurate recommendations for improving user retention and enhancing business values. Despite its significance, there are two challenges for current sequential modeling approaches. From the spatial dimension, it is difficult to mutually perceive similar users' interests for a generalized intention understanding; from the temporal dimension, current methods are generally prone to forgetting long-term interests due to the fixed-length input sequence. In this paper, we present Large Memory Network (LMN), providing a novel idea by compressing and storing user history behavior information in a large-scale memory block. With the elaborated online deployment strategy, the memory block can be easily scaled up to million-scale in the industry. Extensive offline comparison experiments, memory scaling up experiments, and online A/B test on Douyin E-Commerce Search (ECS) are performed, validating the superior performance of LMN. Currently, LMN has been fully deployed in Douyin ECS, serving millions of users each day.

Paper Structure

This paper contains 14 sections, 9 equations, 2 figures, 3 tables.

Figures (2)

  • Figure 1: Overall framework of the proposed LMN. There are two learnable components in the memory block including memory keys and memory values. The LMN uses a two-axis activation for locating the top $K$ relevant values for a given specific query. Besides, the Smooth L1 loss is performed to inject historical interaction with corresponding user information into the memory value slots. The optional mask is used as some of the sequence items are padded to a fixed length with zero embeddings in practice.
  • Figure 2: Online deployment framework of memory parameter server (MPS) for the proposed LMN at ByteDance.