Table of Contents
Fetching ...

Exploration Without Maps via Zero-Shot Out-of-Distribution Deep Reinforcement Learning

Shathushan Sivashangaran, Apoorva Khairnar, Azim Eskandarian

TL;DR

The paper tackles autonomous exploration in GPS-denied, map-free environments by training end-to-end DRL policies in a high-fidelity simulator and achieving zero-shot transfer to real robots. Using a compact two-layer, 64-node network and depth-based observations, the method learns time-efficient, obstacle-avoiding navigation that generalizes to unstructured terrain and dynamic obstacles without explicit maps. Key contributions include emergent cross-distribution generalization from a constrained, physics-forward training environment, a reward design that bridges sim-to-real differences, and analyses on observation-space curation and nonlinear training dynamics, with open-source tooling and real-world validation on XTENTH-CAR. The approach promises computational efficiency and broad applicability across AMRs, enabling robust exploration and data collection in diverse, GPS-denied settings.

Abstract

Operation of Autonomous Mobile Robots (AMRs) of all forms that include wheeled ground vehicles, quadrupeds and humanoids in dynamically changing GPS denied environments without a-priori maps, exclusively using onboard sensors, is an unsolved problem that has potential to transform the economy, and vastly improve humanity's capabilities with improvements to agriculture, manufacturing, disaster response, military and space exploration. Conventional AMR automation approaches are modularized into perception, motion planning and control which is computationally inefficient, and requires explicit feature extraction and engineering, that inhibits generalization, and deployment at scale. Few works have focused on real-world end-to-end approaches that directly map sensor inputs to control outputs due to the large amount of well curated training data required for supervised Deep Learning (DL) which is time consuming and labor intensive to collect and label, and sample inefficiency and challenges to bridging the simulation to reality gap using Deep Reinforcement Learning (DRL). This paper presents a novel method to efficiently train DRL for robust end-to-end AMR exploration, in a constrained environment at physical limits in simulation, transferred zero-shot to the real-world. The representation learned in a compact parameter space with 2 fully connected layers with 64 nodes each is demonstrated to exhibit emergent behavior for out-of-distribution generalization to navigation in new environments that include unstructured terrain without maps, and dynamic obstacle avoidance. The learned policy outperforms conventional navigation algorithms while consuming a fraction of the computation resources, enabling execution on a range of AMR forms with varying embedded computer payloads.

Exploration Without Maps via Zero-Shot Out-of-Distribution Deep Reinforcement Learning

TL;DR

The paper tackles autonomous exploration in GPS-denied, map-free environments by training end-to-end DRL policies in a high-fidelity simulator and achieving zero-shot transfer to real robots. Using a compact two-layer, 64-node network and depth-based observations, the method learns time-efficient, obstacle-avoiding navigation that generalizes to unstructured terrain and dynamic obstacles without explicit maps. Key contributions include emergent cross-distribution generalization from a constrained, physics-forward training environment, a reward design that bridges sim-to-real differences, and analyses on observation-space curation and nonlinear training dynamics, with open-source tooling and real-world validation on XTENTH-CAR. The approach promises computational efficiency and broad applicability across AMRs, enabling robust exploration and data collection in diverse, GPS-denied settings.

Abstract

Operation of Autonomous Mobile Robots (AMRs) of all forms that include wheeled ground vehicles, quadrupeds and humanoids in dynamically changing GPS denied environments without a-priori maps, exclusively using onboard sensors, is an unsolved problem that has potential to transform the economy, and vastly improve humanity's capabilities with improvements to agriculture, manufacturing, disaster response, military and space exploration. Conventional AMR automation approaches are modularized into perception, motion planning and control which is computationally inefficient, and requires explicit feature extraction and engineering, that inhibits generalization, and deployment at scale. Few works have focused on real-world end-to-end approaches that directly map sensor inputs to control outputs due to the large amount of well curated training data required for supervised Deep Learning (DL) which is time consuming and labor intensive to collect and label, and sample inefficiency and challenges to bridging the simulation to reality gap using Deep Reinforcement Learning (DRL). This paper presents a novel method to efficiently train DRL for robust end-to-end AMR exploration, in a constrained environment at physical limits in simulation, transferred zero-shot to the real-world. The representation learned in a compact parameter space with 2 fully connected layers with 64 nodes each is demonstrated to exhibit emergent behavior for out-of-distribution generalization to navigation in new environments that include unstructured terrain without maps, and dynamic obstacle avoidance. The learned policy outperforms conventional navigation algorithms while consuming a fraction of the computation resources, enabling execution on a range of AMR forms with varying embedded computer payloads.
Paper Structure (24 sections, 18 equations, 33 figures, 1 table)

This paper contains 24 sections, 18 equations, 33 figures, 1 table.

Figures (33)

  • Figure 1: XTENTH-CAR wheeled mobile robot.
  • Figure 2: XTENTH-CAR digital twin.
  • Figure 3: Methodology for training and deployment. The DRL model was trained in simulation in a constrained racetrack at physical limits and transferred zero-shot to the real-world for out-of-distribution generalization to new racetrack layouts, exploration in unstructured terrain and dynamic obstacle avoidance.
  • Figure 4: Multidirectional racetrack used for training.
  • Figure 5: Simulated outdoor environment used for evaluation of out-of-distribution generalization.
  • ...and 28 more figures