Table of Contents
Fetching ...

Differentiable Geometric Indexing for End-to-End Generative Retrieval

Xujing Wang, Yufeng Chen, Boxuan Zhang, Jie Zhao, Chao Wei, Cai Xu, Ziyu Guan, Wei Zhao, Weiru Zhang, Xiaoyi Zeng

TL;DR

Differentiable Geometric Indexing (DGI) is proposed, which enforces Operational Unification to bridge the optimization gap, and exhibits superior robustness in long-tail scenarios, validating the necessity of harmonizing structural differentiability with geometric isotropy.

Abstract

Generative Retrieval (GR) has emerged as a promising paradigm to unify indexing and search within a single probabilistic framework. However, existing approaches suffer from two intrinsic conflicts: (1) an Optimization Blockage, where the non-differentiable nature of discrete indexing creates a gradient blockage, decoupling index construction from the downstream retrieval objective; and (2) a Geometric Conflict, where standard unnormalized inner-product objectives induce norm-inflation instability, causing popular "hub" items to geometrically overshadow relevant long-tail items. To systematically resolve these misalignments, we propose Differentiable Geometric Indexing (DGI). First, to bridge the optimization gap, DGI enforces Operational Unification. It employs Soft Teacher Forcing via Gumbel-Softmax to establish a fully differentiable pathway, combined with Symmetric Weight Sharing to effectively align the quantizer's indexing space with the retriever's decoding space. Second, to restore geometric fidelity, DGI introduces Isotropic Geometric Optimization. We replace inner-product logits with scaled cosine similarity on the unit hypersphere to effectively decouple popularity bias from semantic relevance. Extensive experiments on large-scale industry search datasets and online e-commerce platform demonstrate that DGI outperforms competitive sparse, dense, and generative baselines. Notably, DGI exhibits superior robustness in long-tail scenarios, validating the necessity of harmonizing structural differentiability with geometric isotropy.

Differentiable Geometric Indexing for End-to-End Generative Retrieval

TL;DR

Differentiable Geometric Indexing (DGI) is proposed, which enforces Operational Unification to bridge the optimization gap, and exhibits superior robustness in long-tail scenarios, validating the necessity of harmonizing structural differentiability with geometric isotropy.

Abstract

Generative Retrieval (GR) has emerged as a promising paradigm to unify indexing and search within a single probabilistic framework. However, existing approaches suffer from two intrinsic conflicts: (1) an Optimization Blockage, where the non-differentiable nature of discrete indexing creates a gradient blockage, decoupling index construction from the downstream retrieval objective; and (2) a Geometric Conflict, where standard unnormalized inner-product objectives induce norm-inflation instability, causing popular "hub" items to geometrically overshadow relevant long-tail items. To systematically resolve these misalignments, we propose Differentiable Geometric Indexing (DGI). First, to bridge the optimization gap, DGI enforces Operational Unification. It employs Soft Teacher Forcing via Gumbel-Softmax to establish a fully differentiable pathway, combined with Symmetric Weight Sharing to effectively align the quantizer's indexing space with the retriever's decoding space. Second, to restore geometric fidelity, DGI introduces Isotropic Geometric Optimization. We replace inner-product logits with scaled cosine similarity on the unit hypersphere to effectively decouple popularity bias from semantic relevance. Extensive experiments on large-scale industry search datasets and online e-commerce platform demonstrate that DGI outperforms competitive sparse, dense, and generative baselines. Notably, DGI exhibits superior robustness in long-tail scenarios, validating the necessity of harmonizing structural differentiability with geometric isotropy.
Paper Structure (40 sections, 1 theorem, 24 equations, 6 figures, 3 tables)

This paper contains 40 sections, 1 theorem, 24 equations, 6 figures, 3 tables.

Key Result

theorem 1

Under Assumptions A1-A4, if the learning rate sequence satisfies the Robbins-Monro conditions ($\sum \eta_t = \infty, \sum \eta_t^2 < \infty$), applying SGD on $\mathbf{w}$ guarantees that the Riemannian gradient norm vanishes asymptotically in the sense of limit inferior: This implies that there exists a subsequence of iterates $\{\theta_{t_k}\}$ such that $\|\operatorname{grad}_{\mathbb{S}^{d-1

Figures (6)

  • Figure 1: Illustration of the Structural Mismatch and Geometric Anisotropy in existing GR frameworks compared to our DGI.
  • Figure 2: Schematic Overview of the Differentiable Geometric Indexing (DGI) Framework. Unlike two-stage methods that block optimization at the discrete indexing step, DGI establishes a fully differentiable pathway. (1) Operational Unification: During training, we employ Gumbel-Softmax to generate soft quantized vectors. These are fed directly into the decoder via Soft Teacher Forcing, enabling gradients (visualized as green dashed lines) to flow from the $\mathcal{L}_{NTP}$ loss back to the item encoder. We also enforce Symmetric Weight Sharing between the quantization codebooks and the decoder's prediction head to ensure a unified representation space. (2) Geometric Optimization: The entire framework is optimized under spherical constraints (Scaled Cosine) to mitigate hubness.
  • Figure 3: Analysis of Optimization Stability (Gradient Norms). Visualization of gradient magnitude variance on the same dataset. (a) DGI exhibits smooth and consistent gradient flows, indicating that our Soft Teacher Forcing effectively stabilizes the backward pass. (b) The STE Baseline (replacing Gumbel-Softmax with STE) suffers from severe oscillation and high-variance spikes. This contrast empirically validates that the differentiable pathway is crucial for stable end-to-end index learning.
  • Figure 4: Long-tail Robustness Analysis. Performance (HitRate) across item popularity deciles (B0: Head $\to$ B4: Tail). (a) DGI maintains a robust and uniform performance profile across all buckets, demonstrating that our Isotropic Geometric Optimization effectively recalls long-tail items. (b) In contrast, the Two-Stage with Dot Product suffers from a classic "rich-get-richer" pattern, where performance collapses in the tail buckets (B3-B4) due to severe popularity bias.
  • Figure 5: Topological Visualization of Learned Semantic Spaces (t-SNE). (a) DGI learns a highly Isotropic distribution on the hypersphere with well-separated clusters, confirming the geometric restoration capability of our Riemannian optimization. (b) The baseline space exhibits Representation Collapse and Anisotropy (indicated by red arrows), where embeddings crowd into a narrow cone, leaving the semantic space underutilized.
  • ...and 1 more figures

Theorems & Definitions (1)

  • theorem 1