Table of Contents
Fetching ...

Generative Retrieval with Semantic Tree-Structured Item Identifiers via Contrastive Learning

Zihua Si, Zhongxiang Sun, Jiale Chen, Guozhang Chen, Xiaoxue Zang, Kai Zheng, Yang Song, Xiao Zhang, Jun Xu, Kun Gai

TL;DR

SEATER addresses the scalability gap in generative retrieval for large-scale recommendations by learning semantic, tree-structured item identifiers through contrastive learning. It employs a light encoder–decoder Transformer to map user history to fixed-length identifier sequences and uses constrained beam search to decode tree-aligned identifiers. The training integrates a generation loss with two contrastive tasks—infoNCE alignment and triplet ranking—to capture token semantics, hierarchies, and inter-token dependencies. Across four datasets, including an industrial one, SEATER consistently outperforms state-of-the-art baselines while maintaining high efficiency due to the balanced $k$-ary tree design and single-layer decoder, demonstrating practical impact for large-scale RS deployments.

Abstract

The retrieval phase is a vital component in recommendation systems, requiring the model to be effective and efficient. Recently, generative retrieval has become an emerging paradigm for document retrieval, showing notable performance. These methods enjoy merits like being end-to-end differentiable, suggesting their viability in recommendation. However, these methods fall short in efficiency and effectiveness for large-scale recommendations. To obtain efficiency and effectiveness, this paper introduces a generative retrieval framework, namely SEATER, which learns SEmAntic Tree-structured item identifiERs via contrastive learning. Specifically, we employ an encoder-decoder model to extract user interests from historical behaviors and retrieve candidates via tree-structured item identifiers. SEATER devises a balanced k-ary tree structure of item identifiers, allocating semantic space to each token individually. This strategy maintains semantic consistency within the same level, while distinct levels correlate to varying semantic granularities. This structure also maintains consistent and fast inference speed for all items. Considering the tree structure, SEATER learns identifier tokens' semantics, hierarchical relationships, and inter-token dependencies. To achieve this, we incorporate two contrastive learning tasks with the generation task to optimize both the model and identifiers. The infoNCE loss aligns the token embeddings based on their hierarchical positions. The triplet loss ranks similar identifiers in desired orders. In this way, SEATER achieves both efficiency and effectiveness. Extensive experiments on three public datasets and an industrial dataset have demonstrated that SEATER outperforms state-of-the-art models significantly.

Generative Retrieval with Semantic Tree-Structured Item Identifiers via Contrastive Learning

TL;DR

SEATER addresses the scalability gap in generative retrieval for large-scale recommendations by learning semantic, tree-structured item identifiers through contrastive learning. It employs a light encoder–decoder Transformer to map user history to fixed-length identifier sequences and uses constrained beam search to decode tree-aligned identifiers. The training integrates a generation loss with two contrastive tasks—infoNCE alignment and triplet ranking—to capture token semantics, hierarchies, and inter-token dependencies. Across four datasets, including an industrial one, SEATER consistently outperforms state-of-the-art baselines while maintaining high efficiency due to the balanced -ary tree design and single-layer decoder, demonstrating practical impact for large-scale RS deployments.

Abstract

The retrieval phase is a vital component in recommendation systems, requiring the model to be effective and efficient. Recently, generative retrieval has become an emerging paradigm for document retrieval, showing notable performance. These methods enjoy merits like being end-to-end differentiable, suggesting their viability in recommendation. However, these methods fall short in efficiency and effectiveness for large-scale recommendations. To obtain efficiency and effectiveness, this paper introduces a generative retrieval framework, namely SEATER, which learns SEmAntic Tree-structured item identifiERs via contrastive learning. Specifically, we employ an encoder-decoder model to extract user interests from historical behaviors and retrieve candidates via tree-structured item identifiers. SEATER devises a balanced k-ary tree structure of item identifiers, allocating semantic space to each token individually. This strategy maintains semantic consistency within the same level, while distinct levels correlate to varying semantic granularities. This structure also maintains consistent and fast inference speed for all items. Considering the tree structure, SEATER learns identifier tokens' semantics, hierarchical relationships, and inter-token dependencies. To achieve this, we incorporate two contrastive learning tasks with the generation task to optimize both the model and identifiers. The infoNCE loss aligns the token embeddings based on their hierarchical positions. The triplet loss ranks similar identifiers in desired orders. In this way, SEATER achieves both efficiency and effectiveness. Extensive experiments on three public datasets and an industrial dataset have demonstrated that SEATER outperforms state-of-the-art models significantly.
Paper Structure (28 sections, 9 equations, 5 figures, 4 tables, 1 algorithm)

This paper contains 28 sections, 9 equations, 5 figures, 4 tables, 1 algorithm.

Figures (5)

  • Figure 1: A brief illustration of SEATER. The retrieval model encodes the interacted items $\boldsymbol{x}=[x_1,x_2,\cdots,x_t]$ of user $u$ and decodes the identifier $\boldsymbol{y}=[y_1,y_2,\cdots,y_l]$ of item $v$.
  • Figure 2: The proposed tree-structured identifiers and multi-task learning scheme. (a) An example of a balanced $k$-ary tree structure of item identifiers. Here $k$ equals 2 for simplicity. In practice, $k$ can be any integer $\geq 2$. '9' denotes the start token. Each tree node corresponds to an unique token. (b)$-$(d) denote three losses for different tasks. (b) Generation Loss: guide the model to decode item identifiers. (c) Alignment Loss: grasp semantics and hierarchies of tokens. (d) Ranking Loss: differentiate between similar identifiers.
  • Figure 3: Different methods to construct identifiers. The collaborative filtering information and balanced structure make identifiers more informative.
  • Figure 4: Impact of branch number $k$, ranging from $2$ to $32$, in terms of R@50 and HR@50. The corresponding identifier length $l$ is also annotated.
  • Figure 5: Analysis of the number of transformer layers.