Reranker Optimization via Geodesic Distances on k-NN Manifolds
Wen G. Gong
TL;DR
RAG systems often rely on costly cross-encoders, motivating a geometric reranking approach. Maniscope performs a two-stage process: first retrieving a small candidate set via cosine similarity, then refining with geodesic distances computed on a flat $k$-NN manifold, blending global and local signals with $score(c_i) = \alpha \cdot sim_{cos}(q,c_i) + (1-\alpha) \cdot sim_{geo}(a,c_i)$. Across eight BEIR datasets (1,233 queries), it outperforms the HNSW baseline on the hardest tasks while delivering sub-10 ms latency and remaining within 2% of cross-encoder accuracy at 10–45× lower latency; an LLM-based reranker represents an upper bound with far higher latency. The work demonstrates that incorporating local geometric structure into reranking enables practical, real-time RAG deployments and provides an open-source implementation for broader adoption.
Abstract
Current neural reranking approaches for retrieval-augmented generation (RAG) rely on cross-encoders or large language models (LLMs), requiring substantial computational resources and exhibiting latencies of 3-5 seconds per query. We propose Maniscope, a geometric reranking method that computes geodesic distances on k-nearest neighbor (k-NN) manifolds constructed over retrieved document candidates. This approach combines global cosine similarity with local manifold geometry to capture semantic structure that flat Euclidean metrics miss. Evaluating on eight BEIR benchmark datasets (1,233 queries), Maniscope outperforms HNSW graph-based baseline on the three hardest datasets (NFCorpus: +7.0%, TREC-COVID: +1.6%, AorB: +2.8% NDCG@3) while being 3.2x faster (4.7 ms vs 14.8 ms average). Compared to cross-encoder rerankers, Maniscope achieves within 2% accuracy at 10-45x lower latency. On TREC-COVID, LLM-Reranker provides only +0.5% NDCG@3 improvement over Maniscope at 840x higher latency, positioning Maniscope as a practical alternative for real-time RAG deployment. The method requires O(N D + M^2 D + M k log k) complexity where M << N , enabling sub-10 ms latency. We plan to release Maniscope as open-source software.
