Table of Contents
Fetching ...

Geodesic Semantic Search: Learning Local Riemannian Metrics for Citation Graph Retrieval

Brandon Yee, Lucas Wang, Kundana Kommini, Krishna Sharma

TL;DR

A retrieval system that learns node-specific Riemannian metrics on citation graphs to enable geometry-aware semantic search, and theoretical analysis of when geodesic distances outperform direct similarity, characterize the approximation quality of low-rank metrics, and validate predictions empirically are provided.

Abstract

We present Geodesic Semantic Search (GSS), a retrieval system that learns node-specific Riemannian metrics on citation graphs to enable geometry-aware semantic search. Unlike standard embedding-based retrieval that relies on fixed Euclidean distances, \gss{} learns a low-rank metric tensor $\mL_i \in \R^{d \times r}$ at each node, inducing a local positive semi-definite metric $\mG_i = \mL_i \mL_i^\top + \eps \mI$. This parameterization guarantees valid metrics while keeping the model tractable. Retrieval proceeds via multi-source Dijkstra on the learned geodesic distances, followed by Maximal Marginal Relevance reranking and path coherence filtering. On citation prediction benchmarks with 169K papers, \gss{} achieves 23\% relative improvement in Recall@20 over SPECTER+FAISS baselines while providing interpretable citation paths. Our hierarchical coarse-to-fine search with k-means pooling reduces computational cost by 4$\times$ compared to flat geodesic search while maintaining 97\% retrieval quality. We provide theoretical analysis of when geodesic distances outperform direct similarity, characterize the approximation quality of low-rank metrics, and validate predictions empirically. Code and trained models are available at https://github.com/YCRG-Labs/geodesic-search.

Geodesic Semantic Search: Learning Local Riemannian Metrics for Citation Graph Retrieval

TL;DR

A retrieval system that learns node-specific Riemannian metrics on citation graphs to enable geometry-aware semantic search, and theoretical analysis of when geodesic distances outperform direct similarity, characterize the approximation quality of low-rank metrics, and validate predictions empirically are provided.

Abstract

We present Geodesic Semantic Search (GSS), a retrieval system that learns node-specific Riemannian metrics on citation graphs to enable geometry-aware semantic search. Unlike standard embedding-based retrieval that relies on fixed Euclidean distances, \gss{} learns a low-rank metric tensor at each node, inducing a local positive semi-definite metric . This parameterization guarantees valid metrics while keeping the model tractable. Retrieval proceeds via multi-source Dijkstra on the learned geodesic distances, followed by Maximal Marginal Relevance reranking and path coherence filtering. On citation prediction benchmarks with 169K papers, \gss{} achieves 23\% relative improvement in Recall@20 over SPECTER+FAISS baselines while providing interpretable citation paths. Our hierarchical coarse-to-fine search with k-means pooling reduces computational cost by 4 compared to flat geodesic search while maintaining 97\% retrieval quality. We provide theoretical analysis of when geodesic distances outperform direct similarity, characterize the approximation quality of low-rank metrics, and validate predictions empirically. Code and trained models are available at https://github.com/YCRG-Labs/geodesic-search.
Paper Structure (68 sections, 5 theorems, 27 equations, 2 figures, 12 tables, 1 algorithm)

This paper contains 68 sections, 5 theorems, 27 equations, 2 figures, 12 tables, 1 algorithm.

Key Result

Proposition 1

For any $\mathbf{L}_i \in \mathbb{R}^{d \times r}$ with $r \leq d$ and $\epsilon > 0$, the matrix is symmetric positive definite with minimum eigenvalue at least $\epsilon$.

Figures (2)

  • Figure 1: MetricGAT architecture. Input SPECTER features are processed through graph attention layers, then split into embedding and metric heads.
  • Figure 2: Visualization of learned local metrics. (a) Metric ellipsoids vary by field. (b) Metric variance correlates with local graph density.

Theorems & Definitions (12)

  • Definition 1: Local Riemannian Metric
  • Definition 2: Graph Geodesic Distance
  • Proposition 1: Low-Rank Metric Guarantee
  • proof
  • Proposition 2: Hierarchical Approximation
  • Theorem 3: Geodesic Advantage Condition
  • proof : Proof sketch
  • Theorem 4: Low-Rank Metric Approximation
  • Proposition 5: Smoothness-Coherence Connection
  • proof
  • ...and 2 more