Table of Contents
Fetching ...

One Backpropagation in Two Tower Recommendation Models

Erjia Chen, Bang Wang

TL;DR

A novel one backpropagation updating strategy is proposed, which keeps the normal gradient backpropagation for the item encoding tower, but cuts off the backpropagation for the user encoding tower, and proposes a moving-aggregation updating strategy to update a user encoding in each training epoch.

Abstract

Recent years have witnessed extensive researches on developing two tower recommendation models for relieving information overload. Four building modules can be identified in such models, namely, user-item encoding, negative sampling, loss computing and back-propagation updating. To the best of our knowledge, existing algorithms have researched only on the first three modules, yet neglecting the backpropagation module. They all adopt a kind of two backpropagation strategy, which are based on an implicit assumption of equally treating users and items in the training phase. In this paper, we challenge such an equal training assumption and propose a novel one backpropagation updating strategy, which keeps the normal gradient backpropagation for the item encoding tower, but cuts off the backpropagation for the user encoding tower. Instead, we propose a moving-aggregation updating strategy to update a user encoding in each training epoch. Except the proposed backpropagation updating module, we implement the other three modules with the most straightforward choices. Experiments on four public datasets validate the effectiveness and efficiency of our model in terms of improved recommendation performance and reduced computation overload over the state-of-the-art competitors.

One Backpropagation in Two Tower Recommendation Models

TL;DR

A novel one backpropagation updating strategy is proposed, which keeps the normal gradient backpropagation for the item encoding tower, but cuts off the backpropagation for the user encoding tower, and proposes a moving-aggregation updating strategy to update a user encoding in each training epoch.

Abstract

Recent years have witnessed extensive researches on developing two tower recommendation models for relieving information overload. Four building modules can be identified in such models, namely, user-item encoding, negative sampling, loss computing and back-propagation updating. To the best of our knowledge, existing algorithms have researched only on the first three modules, yet neglecting the backpropagation module. They all adopt a kind of two backpropagation strategy, which are based on an implicit assumption of equally treating users and items in the training phase. In this paper, we challenge such an equal training assumption and propose a novel one backpropagation updating strategy, which keeps the normal gradient backpropagation for the item encoding tower, but cuts off the backpropagation for the user encoding tower. Instead, we propose a moving-aggregation updating strategy to update a user encoding in each training epoch. Except the proposed backpropagation updating module, we implement the other three modules with the most straightforward choices. Experiments on four public datasets validate the effectiveness and efficiency of our model in terms of improved recommendation performance and reduced computation overload over the state-of-the-art competitors.
Paper Structure (20 sections, 8 equations, 7 figures, 3 tables)

This paper contains 20 sections, 8 equations, 7 figures, 3 tables.

Figures (7)

  • Figure 1: Illustration of a typical two tower recommendation model with four building modules, namely, user-item encoding, negative sampling, loss computing, and backpropagation updating.
  • Figure 2: The proposed OneBP model: It cuts off the gradient backpropagation to the user encoder and uses moving-aggregation for user encoding update.
  • Figure 3: Performance of OneBP on different encoders.
  • Figure 4: Visualization of clustering on items' encodings at initialization and after model training by different strategies.
  • Figure 5: Hyper-parameter Study for $N_s$ and $\beta$
  • ...and 2 more figures