Generative Retrieval with Semantic Tree-Structured Item Identifiers via Contrastive Learning

Zihua Si; Zhongxiang Sun; Jiale Chen; Guozhang Chen; Xiaoxue Zang; Kai Zheng; Yang Song; Xiao Zhang; Jun Xu; Kun Gai

Generative Retrieval with Semantic Tree-Structured Item Identifiers via Contrastive Learning

Zihua Si, Zhongxiang Sun, Jiale Chen, Guozhang Chen, Xiaoxue Zang, Kai Zheng, Yang Song, Xiao Zhang, Jun Xu, Kun Gai

TL;DR

SEATER addresses the scalability gap in generative retrieval for large-scale recommendations by learning semantic, tree-structured item identifiers through contrastive learning. It employs a light encoder–decoder Transformer to map user history to fixed-length identifier sequences and uses constrained beam search to decode tree-aligned identifiers. The training integrates a generation loss with two contrastive tasks—infoNCE alignment and triplet ranking—to capture token semantics, hierarchies, and inter-token dependencies. Across four datasets, including an industrial one, SEATER consistently outperforms state-of-the-art baselines while maintaining high efficiency due to the balanced $k$-ary tree design and single-layer decoder, demonstrating practical impact for large-scale RS deployments.

Abstract

The retrieval phase is a vital component in recommendation systems, requiring the model to be effective and efficient. Recently, generative retrieval has become an emerging paradigm for document retrieval, showing notable performance. These methods enjoy merits like being end-to-end differentiable, suggesting their viability in recommendation. However, these methods fall short in efficiency and effectiveness for large-scale recommendations. To obtain efficiency and effectiveness, this paper introduces a generative retrieval framework, namely SEATER, which learns SEmAntic Tree-structured item identifiERs via contrastive learning. Specifically, we employ an encoder-decoder model to extract user interests from historical behaviors and retrieve candidates via tree-structured item identifiers. SEATER devises a balanced k-ary tree structure of item identifiers, allocating semantic space to each token individually. This strategy maintains semantic consistency within the same level, while distinct levels correlate to varying semantic granularities. This structure also maintains consistent and fast inference speed for all items. Considering the tree structure, SEATER learns identifier tokens' semantics, hierarchical relationships, and inter-token dependencies. To achieve this, we incorporate two contrastive learning tasks with the generation task to optimize both the model and identifiers. The infoNCE loss aligns the token embeddings based on their hierarchical positions. The triplet loss ranks similar identifiers in desired orders. In this way, SEATER achieves both efficiency and effectiveness. Extensive experiments on three public datasets and an industrial dataset have demonstrated that SEATER outperforms state-of-the-art models significantly.

Generative Retrieval with Semantic Tree-Structured Item Identifiers via Contrastive Learning

TL;DR

-ary tree design and single-layer decoder, demonstrating practical impact for large-scale RS deployments.

Abstract

Paper Structure (28 sections, 9 equations, 5 figures, 4 tables, 1 algorithm)

This paper contains 28 sections, 9 equations, 5 figures, 4 tables, 1 algorithm.

Introduction
RELATED WORK
Method
Overview
Retrieval Model
Encoder-Decoder Architecture
Item Identifiers
Training
Generation Loss
Alignment Loss
Ranking Loss
Multi-task Training
Inference
Discussion
Comparison with Existing Work
...and 13 more sections

Figures (5)

Figure 1: A brief illustration of SEATER. The retrieval model encodes the interacted items $\boldsymbol{x}=[x_1,x_2,\cdots,x_t]$ of user $u$ and decodes the identifier $\boldsymbol{y}=[y_1,y_2,\cdots,y_l]$ of item $v$.
Figure 2: The proposed tree-structured identifiers and multi-task learning scheme. (a) An example of a balanced $k$-ary tree structure of item identifiers. Here $k$ equals 2 for simplicity. In practice, $k$ can be any integer $\geq 2$. '9' denotes the start token. Each tree node corresponds to an unique token. (b)$-$(d) denote three losses for different tasks. (b) Generation Loss: guide the model to decode item identifiers. (c) Alignment Loss: grasp semantics and hierarchies of tokens. (d) Ranking Loss: differentiate between similar identifiers.
Figure 3: Different methods to construct identifiers. The collaborative filtering information and balanced structure make identifiers more informative.
Figure 4: Impact of branch number $k$, ranging from $2$ to $32$, in terms of R@50 and HR@50. The corresponding identifier length $l$ is also annotated.
Figure 5: Analysis of the number of transformer layers.

Generative Retrieval with Semantic Tree-Structured Item Identifiers via Contrastive Learning

TL;DR

Abstract

Generative Retrieval with Semantic Tree-Structured Item Identifiers via Contrastive Learning

Authors

TL;DR

Abstract

Table of Contents

Figures (5)