Learning to navigate efficiently and precisely in real environments

Guillaume Bono; Hervé Poirier; Leonid Antsfeld; Gianluca Monaci; Boris Chidlovskii; Christian Wolf

Learning to navigate efficiently and precisely in real environments

Guillaume Bono, Hervé Poirier, Leonid Antsfeld, Gianluca Monaci, Boris Chidlovskii, Christian Wolf

TL;DR

Problem: closing the sim2real gap for end-to-end navigation policies trained in simulation. Method: introduce a fast, second-order dynamical model of the robot and integrate realistic sensing noise within Habitat, training policies to predict discretized velocity commands that are executed in closed loop. Contributions: integrated motion model, dual localization signals, and thorough real-robot and large-scale simulation evaluations showing substantial gains over prior end-to-end methods. Significance: enables robust, efficient navigation in real environments with minimal sim2real transfer via improved dynamics realism.

Abstract

In the context of autonomous navigation of terrestrial robots, the creation of realistic models for agent dynamics and sensing is a widespread habit in the robotics literature and in commercial applications, where they are used for model based control and/or for localization and mapping. The more recent Embodied AI literature, on the other hand, focuses on modular or end-to-end agents trained in simulators like Habitat or AI-Thor, where the emphasis is put on photo-realistic rendering and scene diversity, but high-fidelity robot motion is assigned a less privileged role. The resulting sim2real gap significantly impacts transfer of the trained models to real robotic platforms. In this work we explore end-to-end training of agents in simulation in settings which minimize the sim2real gap both, in sensing and in actuation. Our agent directly predicts (discretized) velocity commands, which are maintained through closed-loop control in the real robot. The behavior of the real robot (including the underlying low-level controller) is identified and simulated in a modified Habitat simulator. Noise models for odometry and localization further contribute in lowering the sim2real gap. We evaluate on real navigation scenarios, explore different localization and point goal calculation methods and report significant gains in performance and robustness compared to prior work.

Learning to navigate efficiently and precisely in real environments

TL;DR

Abstract

Paper Structure (11 sections, 9 equations, 11 figures, 4 tables)

This paper contains 11 sections, 9 equations, 11 figures, 4 tables.

Introduction
Related Work
End-to-end training with realistic sensing
A dynamical model of realistic robot motion
Experimental Results
Results
Conclusion
System identification
Agent architecture
Details on the fused simulated Lidar scan
Furniture rearrangement

Figures (11)

Figure 1: Efficient navigation with policies end-to-end trained in 3D photorealistic simulators requires closing the sim2real gap in sensing and actuation. Efficiency demands that the robot continues to move during decision taking (as opposed to stopping for each sensing operation), and this requires a realistic motion model in simulation allowing the agent to internally anticipate its future state. This requirement is exacerbated by the delay between sensing ① and actuation ② caused by the computational complexity of high-capacity deep networks (visual encoders, policy). To model realistic motion while training in simulation, we create a $2^{nd}$ order dynamical model running with higher frequency, which models the robot and its low-level closed-loop controller. We identify the model from real data and add it to the Habitat Savva_2019_ICCV Simulator.
Figure 2: The agent uses a recurrent policy with a static point goal $g_0$ as input, i.e. the goal is constant and given wrt. to the initial reference frame. During training, the estimation of the dynamic point goal $\hat{g}_t$ is supervised from privileged information.
Figure 3: Training visual navigation with realistic motion models: we train an end-to-end agent in simulation (top) subject to two different simulation loops: a slower loop at 3Hz (indexed by $t$) renders visual observations and takes agent decisions, while a faster loop at 30 Hz (indexed by $\tau$) simulates physics. Physics is approximated with a $2^{nd}$ order model identified from real robot rollouts (bottom) and includes the robot physics as well as the behavior of the closed-loop control of the differential drive (neither onboard control algorithm nor control frequency need to be known). Operations in the intervals are pipelined, eg. sensing occurs at each time step, as does agent forward pass etc. The agent architecture is detailed in Figure \ref{['fig:policy']}.
Figure 4: The action space: 28 actions, 4 choices of linear velocities $\in[0,1]$ m/s, 7 choices for angular vel. $\in[-3,3]$ rad/s. Arrows show the effect on pose of action held for $\frac{2}{3}$sec.
Figure 5: Robustness --- Left: the end-to-end trained agent is surprisingly robust and allows navigation very close to finely structured obstacles. Right: Navigation based on single ray Lidar and planning on occupancy maps, widely used in ROS based solutions, is difficult and error prone in situations where obstacles are thin at the height of the Lidar ray, are undetected and lead to collisions.
...and 6 more figures

Learning to navigate efficiently and precisely in real environments

TL;DR

Abstract

Learning to navigate efficiently and precisely in real environments

Authors

TL;DR

Abstract

Table of Contents

Figures (11)