Table of Contents
Fetching ...

OpenBench: A New Benchmark and Baseline for Semantic Navigation in Smart Logistics

Junhui Wang, Dongjie Huo, Zehui Xu, Yongliang Shi, Yimin Yan, Yuanxin Wang, Chao Gao, Yan Qiao, Guyue Zhou

TL;DR

The paper presents OPEN, a GPS-free outdoor semantic navigation framework that fuses OpenStreetMap with Large Language Models and Vision-Language Models to enable scalable last-mile delivery without pre-mapping. It introduces a benchmark tailored to residential outdoor navigation, along with a baseline system that plans tasks, generates waypoints, localizes globally with VLMs, and updates maps online. Through extensive simulation and real-world experiments, OPEN demonstrates improved navigation efficiency, reliability, and long-term performance compared to learning-based baselines, while maintaining lightweight map storage. The work offers practical impact for deploying autonomous delivery robots in urban environments and provides public code and benchmarks to accelerate research. The combination of OSM-based routing, GPS-free localization, and continuous map enrichment addresses key deployment barriers in smart logistics.

Abstract

The increasing demand for efficient last-mile delivery in smart logistics underscores the role of autonomous robots in enhancing operational efficiency and reducing costs. Traditional navigation methods, which depend on high-precision maps, are resource-intensive, while learning-based approaches often struggle with generalization in real-world scenarios. To address these challenges, this work proposes the Openstreetmap-enhanced oPen-air sEmantic Navigation (OPEN) system that combines foundation models with classic algorithms for scalable outdoor navigation. The system uses off-the-shelf OpenStreetMap (OSM) for flexible map representation, thereby eliminating the need for extensive pre-mapping efforts. It also employs Large Language Models (LLMs) to comprehend delivery instructions and Vision-Language Models (VLMs) for global localization, map updates, and house number recognition. To compensate the limitations of existing benchmarks that are inadequate for assessing last-mile delivery, this work introduces a new benchmark specifically designed for outdoor navigation in residential areas, reflecting the real-world challenges faced by autonomous delivery systems. Extensive experiments in simulated and real-world environments demonstrate the proposed system's efficacy in enhancing navigation efficiency and reliability. To facilitate further research, our code and benchmark are publicly available.

OpenBench: A New Benchmark and Baseline for Semantic Navigation in Smart Logistics

TL;DR

The paper presents OPEN, a GPS-free outdoor semantic navigation framework that fuses OpenStreetMap with Large Language Models and Vision-Language Models to enable scalable last-mile delivery without pre-mapping. It introduces a benchmark tailored to residential outdoor navigation, along with a baseline system that plans tasks, generates waypoints, localizes globally with VLMs, and updates maps online. Through extensive simulation and real-world experiments, OPEN demonstrates improved navigation efficiency, reliability, and long-term performance compared to learning-based baselines, while maintaining lightweight map storage. The work offers practical impact for deploying autonomous delivery robots in urban environments and provides public code and benchmarks to accelerate research. The combination of OSM-based routing, GPS-free localization, and continuous map enrichment addresses key deployment barriers in smart logistics.

Abstract

The increasing demand for efficient last-mile delivery in smart logistics underscores the role of autonomous robots in enhancing operational efficiency and reducing costs. Traditional navigation methods, which depend on high-precision maps, are resource-intensive, while learning-based approaches often struggle with generalization in real-world scenarios. To address these challenges, this work proposes the Openstreetmap-enhanced oPen-air sEmantic Navigation (OPEN) system that combines foundation models with classic algorithms for scalable outdoor navigation. The system uses off-the-shelf OpenStreetMap (OSM) for flexible map representation, thereby eliminating the need for extensive pre-mapping efforts. It also employs Large Language Models (LLMs) to comprehend delivery instructions and Vision-Language Models (VLMs) for global localization, map updates, and house number recognition. To compensate the limitations of existing benchmarks that are inadequate for assessing last-mile delivery, this work introduces a new benchmark specifically designed for outdoor navigation in residential areas, reflecting the real-world challenges faced by autonomous delivery systems. Extensive experiments in simulated and real-world environments demonstrate the proposed system's efficacy in enhancing navigation efficiency and reliability. To facilitate further research, our code and benchmark are publicly available.

Paper Structure

This paper contains 32 sections, 4 equations, 5 figures, 4 tables.

Figures (5)

  • Figure 1: Overview of the proposed benchmark framework. The diagram presents the simulation environments and corresponding OSM, which are provided for the implementation of semantic navigation systems. This framework necessitates the navigation system to process natural language instructions autonomously, enabling accurate navigation from the initial starting point to the designated customer’s front door.
  • Figure 2: Simulation environment for last-mile delivery.
  • Figure 3: Overview of the OPEN system for autonomous last-mile delivery. The system initiates with a natural language delivery request, processed by a task planning module powered by an LLM. This module interacts with OSM to extract destination details and generates a structured task sequence. The robot autonomously decides between navigation and exploration modes, generating waypoints for execution by a classical planner. Localization is performed using classical methods, with global localization enhanced through integration of MobileSAM and CLIP models with OSM to correct positional errors. The robot also updates OSM with newly detected objects, continuously improving map detail and navigation performance for subsequent deliveries.
  • Figure 4: The robot used in real-world navigation experiments.
  • Figure 5: Illustration of the real-world experiment. The top-left part presents the OSM and target buildings. The bottom-left part displays the delivery instructions. The right side of the figure shows the navigation trajectories of different methods.