Table of Contents
Fetching ...

An Empirical Survey and Benchmark of Learned Distance Indexes for Road Networks

Gautam Choudhary, Libin Zhou, Yeasir Rayhan, Walid G. Aref

TL;DR

The paper addresses the need for efficient distance queries on road networks by evaluating a broad set of learned distance indexes against classical baselines. It introduces an encoder–decoder framework to unify diverse methods, benchmarks ten techniques across seven real-world networks using workload-driven data, and measures approximation error, preprocessing time, query latency, and storage. Key findings show CatBoost achieving the highest accuracy, RNE delivering fast latency among learned methods, and LandmarkNN offering a favorable accuracy–latency balance, while GPU inference provides substantial speedups. The work provides open-source tooling for reproducibility and highlights practical considerations for deploying learned distance indexes in real-time routing and analytics workloads.

Abstract

The calculation of shortest-path distances in road networks is a core operation in navigation systems, location-based services, and spatial analytics. Although classical algorithms, e.g., Dijkstra's algorithm, provide exact answers, their latency is prohibitive for modern real-time, large-scale deployments. Over the past two decades, numerous distance indexes have been proposed to speed up query processing for shortest distance queries. More recently, with the advancement in machine learning (ML), researchers have designed and proposed ML-based distance indexes to answer approximate shortest path and distance queries efficiently. However, a comprehensive and systematic evaluation of these ML-based approaches is lacking. This paper presents the first empirical survey of ML-based distance indexes on road networks, evaluating them along four key dimensions: Training time, query latency, storage, and accuracy. Using seven real-world road networks and workload-driven query datasets derived from trajectory data, we benchmark ten representative ML techniques and compare them against strong classical non-ML baselines, highlighting key insights and practical trade-offs. We release a unified open-source codebase to support reproducibility and future research on learned distance indexes.

An Empirical Survey and Benchmark of Learned Distance Indexes for Road Networks

TL;DR

The paper addresses the need for efficient distance queries on road networks by evaluating a broad set of learned distance indexes against classical baselines. It introduces an encoder–decoder framework to unify diverse methods, benchmarks ten techniques across seven real-world networks using workload-driven data, and measures approximation error, preprocessing time, query latency, and storage. Key findings show CatBoost achieving the highest accuracy, RNE delivering fast latency among learned methods, and LandmarkNN offering a favorable accuracy–latency balance, while GPU inference provides substantial speedups. The work provides open-source tooling for reproducibility and highlights practical considerations for deploying learned distance indexes in real-time routing and analytics workloads.

Abstract

The calculation of shortest-path distances in road networks is a core operation in navigation systems, location-based services, and spatial analytics. Although classical algorithms, e.g., Dijkstra's algorithm, provide exact answers, their latency is prohibitive for modern real-time, large-scale deployments. Over the past two decades, numerous distance indexes have been proposed to speed up query processing for shortest distance queries. More recently, with the advancement in machine learning (ML), researchers have designed and proposed ML-based distance indexes to answer approximate shortest path and distance queries efficiently. However, a comprehensive and systematic evaluation of these ML-based approaches is lacking. This paper presents the first empirical survey of ML-based distance indexes on road networks, evaluating them along four key dimensions: Training time, query latency, storage, and accuracy. Using seven real-world road networks and workload-driven query datasets derived from trajectory data, we benchmark ten representative ML techniques and compare them against strong classical non-ML baselines, highlighting key insights and practical trade-offs. We release a unified open-source codebase to support reproducibility and future research on learned distance indexes.
Paper Structure (29 sections, 4 equations, 2 figures, 7 tables)

This paper contains 29 sections, 4 equations, 2 figures, 7 tables.

Figures (2)

  • Figure 1: Encoder-decoder for shortest-distance estimation.
  • Figure 2: Test MRE as function of training time for three representative datasets (small, medium, and large road networks).