Learning Deep Tree-based Retriever for Efficient Recommendation: Theory and Method

Ze Liu; Jin Zhang; Chao Feng; Defu Lian; Jie Wang; Enhong Chen

Learning Deep Tree-based Retriever for Efficient Recommendation: Theory and Method

Ze Liu, Jin Zhang, Chao Feng, Defu Lian, Jie Wang, Enhong Chen

TL;DR

The paper tackles the efficiency bottleneck of deep recommender systems by proposing a Deep Tree-based Retriever (DTR) that jointly learns a max-heap tree index and a neural preference model. It replaces prior node-wise binary training with a softmax-based multi-class training across tree levels, and introduces label rectification to address suboptimality under beam search, plus a tree-based sampled softmax scheme for scalable training. Theoretical analysis shows improved generalization and a path to Bayes-optimal top-k retrieval under beam search, while experiments on four real datasets demonstrate substantial gains in top-k metrics and retrieval efficiency. Overall, DTR advances efficient retrieval by tightly coupling index learning with preference modeling and providing principled corrections and sampling schemes to maintain accuracy at scale.

Abstract

Although advancements in deep learning have significantly enhanced the recommendation accuracy of deep recommendation models, these methods still suffer from low recommendation efficiency. Recently proposed tree-based deep recommendation models alleviate the problem by directly learning tree structure and representations under the guidance of recommendation objectives. To guarantee the effectiveness of beam search for recommendation accuracy, these models strive to ensure that the tree adheres to the max-heap assumption, where a parent node's preference should be the maximum among its children's preferences. However, they employ a one-versus-all strategy, framing the training task as a series of independent binary classification objectives for each node, which limits their ability to fully satisfy the max-heap assumption. To this end, we propose a Deep Tree-based Retriever (DTR for short) for efficient recommendation. DTR frames the training task as a softmax-based multi-class classification over tree nodes at the same level, enabling explicit horizontal competition and more discriminative top-k selection among them, which mimics the beam search behavior during training. To mitigate the suboptimality induced by the labeling of non-leaf nodes, we propose a rectification method for the loss function, which further aligns with the max-heap assumption in expectation. As the number of tree nodes grows exponentially with the levels, we employ sampled softmax to approximate optimization and thereby enhance efficiency. Furthermore, we propose a tree-based sampling method to reduce the bias inherent in sampled softmax. Theoretical results reveal DTR's generalization capability, and both the rectification method and tree-based sampling contribute to improved generalization. The experiments are conducted on four real-world datasets, validating the effectiveness of the proposed method.

Learning Deep Tree-based Retriever for Efficient Recommendation: Theory and Method

TL;DR

Abstract

Paper Structure (75 sections, 18 theorems, 100 equations, 8 figures, 9 tables, 1 algorithm)

This paper contains 75 sections, 18 theorems, 100 equations, 8 figures, 9 tables, 1 algorithm.

Introduction
Related work
Efficient Recommendation
Efficient Training Techniques of Recommendation
Negative Sampling in RecSys
Techniques of Speeding Up Softmax Computation
Theoretical Work
Bayes Optimality in Multi-class Classification and Hierarchical Classification
Generalization Bounds for Multi-class Classification
Preliminaries
Problem Definition and Notation
Tree-based Model
Tree Index
Preference Model
Top-k Retrieval Process
...and 60 more sections

Key Result

Theorem 1

Given a convex, differentiable function $\psi:\mathbb{R}^N\mapsto \mathbb{R}$. Let $\boldsymbol{z}$ be a random vector taking values in $\mathbb{R}^N$ for which both $\mathbb{E}[\boldsymbol{z}]$ and $\mathbb{E}[\psi(\boldsymbol{z})]$ are finite. If continuous function $g:\mathbb{R}^N\mapsto \mathbb{

Figures (8)

Figure 1: Illustration of tree-based sampling along the tree. The number beside the edge is the child's expanding probability given the parent.
Figure 2: The process of updating the mapping between items and leaf nodes, where $d=2$ in this case.
Figure 3: The variation of $F\text{-}measure@20$ with tree updating for JTM and tree DTR variants across four datasets.
Figure 4: The $F\text{-}measure@20$ of DTR with varying numbers of negative samples on MIND and Movie.
Figure 5: The $F\text{-}measure@20$ of DTR with varying numbers of tree branch on MIND and Movie.
...and 3 more figures

Theorems & Definitions (36)

Definition 1: Top-$k$ Retrieval Bayes Optimal
Definition 2: Rank Consistent
Theorem 1: Theorem 3.1 of yang2020consistency
Proposition 1
proof
Definition 3: Label Rectification
Proposition 2
proof
Proposition 3
Theorem 2: Theorem 2.1 of blanc2018adaptive
...and 26 more

Learning Deep Tree-based Retriever for Efficient Recommendation: Theory and Method

TL;DR

Abstract

Learning Deep Tree-based Retriever for Efficient Recommendation: Theory and Method

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (8)

Theorems & Definitions (36)