Table of Contents
Fetching ...

Bench2FreeAD: A Benchmark for Vision-based End-to-end Navigation in Unstructured Robotic Environments

Yuhang Peng, Sidong Wang, Jihaoyu Yang, Shilong Li, Han Wang, Jiangtao Gong

TL;DR

Bench2FreeAD addresses the problem of vision-based end-to-end navigation in unstructured environments by introducing the FreeWorld dataset (real and synthetic) and a BEV-based benchmark. The approach fine-tunes two E2E drivers, VAD and LAW, on this data and finds that LAW, which uses implicit representations, achieves lower $L_2$ trajectory error and lower collision rates than VAD, especially when augmented with real data. The study reveals that virtual and real data occupy a similar distribution from the learner's perspective, and real data further reduces collisions, highlighting practical value for logistics and service robots. Overall, the benchmark offers a solid foundation for developing robust E2E navigation in challenging spaces and for future closed-loop deployments.

Abstract

Most current end-to-end (E2E) autonomous driving algorithms are built on standard vehicles in structured transportation scenarios, lacking exploration of robot navigation for unstructured scenarios such as auxiliary roads, campus roads, and indoor settings. This paper investigates E2E robot navigation in unstructured road environments. First, we introduce two data collection pipelines - one for real-world robot data and another for synthetic data generated using the Isaac Sim simulator, which together produce an unstructured robotics navigation dataset -- FreeWorld Dataset. Second, we fine-tuned an efficient E2E autonomous driving model -- VAD -- using our datasets to validate the performance and adaptability of E2E autonomous driving models in these environments. Results demonstrate that fine-tuning through our datasets significantly enhances the navigation potential of E2E autonomous driving models in unstructured robotic environments. Thus, this paper presents the first dataset targeting E2E robot navigation tasks in unstructured scenarios, and provides a benchmark based on vision-based E2E autonomous driving algorithms to facilitate the development of E2E navigation technology for logistics and service robots. The project is available on Github.

Bench2FreeAD: A Benchmark for Vision-based End-to-end Navigation in Unstructured Robotic Environments

TL;DR

Bench2FreeAD addresses the problem of vision-based end-to-end navigation in unstructured environments by introducing the FreeWorld dataset (real and synthetic) and a BEV-based benchmark. The approach fine-tunes two E2E drivers, VAD and LAW, on this data and finds that LAW, which uses implicit representations, achieves lower trajectory error and lower collision rates than VAD, especially when augmented with real data. The study reveals that virtual and real data occupy a similar distribution from the learner's perspective, and real data further reduces collisions, highlighting practical value for logistics and service robots. Overall, the benchmark offers a solid foundation for developing robust E2E navigation in challenging spaces and for future closed-loop deployments.

Abstract

Most current end-to-end (E2E) autonomous driving algorithms are built on standard vehicles in structured transportation scenarios, lacking exploration of robot navigation for unstructured scenarios such as auxiliary roads, campus roads, and indoor settings. This paper investigates E2E robot navigation in unstructured road environments. First, we introduce two data collection pipelines - one for real-world robot data and another for synthetic data generated using the Isaac Sim simulator, which together produce an unstructured robotics navigation dataset -- FreeWorld Dataset. Second, we fine-tuned an efficient E2E autonomous driving model -- VAD -- using our datasets to validate the performance and adaptability of E2E autonomous driving models in these environments. Results demonstrate that fine-tuning through our datasets significantly enhances the navigation potential of E2E autonomous driving models in unstructured robotic environments. Thus, this paper presents the first dataset targeting E2E robot navigation tasks in unstructured scenarios, and provides a benchmark based on vision-based E2E autonomous driving algorithms to facilitate the development of E2E navigation technology for logistics and service robots. The project is available on Github.

Paper Structure

This paper contains 27 sections, 8 figures, 3 tables.

Figures (8)

  • Figure 1: Overview of the real vehicle setup, sensor coordinate system, and camera group field of view.
  • Figure 2: An example from the FreeWorld dataset. We see 6 different camera views and lidar data, as well as the human annotated semantic map. At the bottom we show the human written scene description.
  • Figure 3: The upper part illustrates the grid-based heat occupancy map and its vectorized representation generated in the Unity simulation environment. The lower part shows 3D bounding boxes of object instances (e.g., humans and cars) produced by the Unity Perception package.
  • Figure 4: Qualitative results of VAD(r). VAD(r) generates vectorized representations of unstructured scenes and predicts 3D bounding boxes for people.
  • Figure 5: Qualitative results of VAD on the FreeWorld dataset.
  • ...and 3 more figures