Table of Contents
Fetching ...

Multi-Object Navigation in real environments using hybrid policies

Assem Sadek, Guillaume Bono, Boris Chidlovskii, Atilla Baskurt, Christian Wolf

TL;DR

This work targets Multi-Object Navigation (Multi-ON) in real environments, addressing the sim2real gap by adopting a hybrid modular policy that separates geometry and semantics. The system maintains a metric EgoMap $\mathbf{M}_t\u007f$ aligned with a semantic point cloud, enabling robust exploration and goal retrieval through a two-tier control: a learned exploration policy outputs 2D waypoints $\mathbf{p}_t=(x_t,y_t)\u007f$ on a heatmap $\mathbf{H}_t\u007f$, while a local D* planner executes trajectories on the same map; object detections from DeepLab v3 are mapped into the EgoMap. The exploration objective is trained with RL to maximize coverage using reward $r_t = \alpha (e_t - e_{t-1})$, with the waypoint $\mathbf{p}_t\u007f$ sampled from $\mathbf{H}_t\u007f$ and a subgoal-based training regime to stabilize learning. Real-robot experiments on a LoCoBot show 43% success and two episodes retrieving all three targets, highlighting improved sim2real transfer compared to end-to-end baselines and validating the approach's robustness in real conditions.

Abstract

Navigation has been classically solved in robotics through the combination of SLAM and planning. More recently, beyond waypoint planning, problems involving significant components of (visual) high-level reasoning have been explored in simulated environments, mostly addressed with large-scale machine learning, in particular RL, offline-RL or imitation learning. These methods require the agent to learn various skills like local planning, mapping objects and querying the learned spatial representations. In contrast to simpler tasks like waypoint planning (PointGoal), for these more complex tasks the current state-of-the-art models have been thoroughly evaluated in simulation but, to our best knowledge, not yet in real environments. In this work we focus on sim2real transfer. We target the challenging Multi-Object Navigation (Multi-ON) task and port it to a physical environment containing real replicas of the originally virtual Multi-ON objects. We introduce a hybrid navigation method, which decomposes the problem into two different skills: (1) waypoint navigation is addressed with classical SLAM combined with a symbolic planner, whereas (2) exploration, semantic mapping and goal retrieval are dealt with deep neural networks trained with a combination of supervised learning and RL. We show the advantages of this approach compared to end-to-end methods both in simulation and a real environment and outperform the SOTA for this task.

Multi-Object Navigation in real environments using hybrid policies

TL;DR

This work targets Multi-Object Navigation (Multi-ON) in real environments, addressing the sim2real gap by adopting a hybrid modular policy that separates geometry and semantics. The system maintains a metric EgoMap aligned with a semantic point cloud, enabling robust exploration and goal retrieval through a two-tier control: a learned exploration policy outputs 2D waypoints on a heatmap , while a local D* planner executes trajectories on the same map; object detections from DeepLab v3 are mapped into the EgoMap. The exploration objective is trained with RL to maximize coverage using reward , with the waypoint sampled from and a subgoal-based training regime to stabilize learning. Real-robot experiments on a LoCoBot show 43% success and two episodes retrieving all three targets, highlighting improved sim2real transfer compared to end-to-end baselines and validating the approach's robustness in real conditions.

Abstract

Navigation has been classically solved in robotics through the combination of SLAM and planning. More recently, beyond waypoint planning, problems involving significant components of (visual) high-level reasoning have been explored in simulated environments, mostly addressed with large-scale machine learning, in particular RL, offline-RL or imitation learning. These methods require the agent to learn various skills like local planning, mapping objects and querying the learned spatial representations. In contrast to simpler tasks like waypoint planning (PointGoal), for these more complex tasks the current state-of-the-art models have been thoroughly evaluated in simulation but, to our best knowledge, not yet in real environments. In this work we focus on sim2real transfer. We target the challenging Multi-Object Navigation (Multi-ON) task and port it to a physical environment containing real replicas of the originally virtual Multi-ON objects. We introduce a hybrid navigation method, which decomposes the problem into two different skills: (1) waypoint navigation is addressed with classical SLAM combined with a symbolic planner, whereas (2) exploration, semantic mapping and goal retrieval are dealt with deep neural networks trained with a combination of supervised learning and RL. We show the advantages of this approach compared to end-to-end methods both in simulation and a real environment and outperform the SOTA for this task.
Paper Structure (5 sections, 2 equations, 6 figures, 4 tables)

This paper contains 5 sections, 2 equations, 6 figures, 4 tables.

Figures (6)

  • Figure 1: We perform Multi-Object Navigation DBLP:conf/nips/WaniPJCS20, i.e. the sequential visual search of multiple object in a given order, and are the first do this in real physical environments (a) characterized by a large sim2real gap. This is illustrated by two first-person views (b), real and (c), simulation. We propose a hybrid method combining classical mapping and deep learning, and compare to the SOTA methods on this task using end-to-end RL training and auxiliary losses MarzaIROS2022.
  • Figure 2: An agent for multi-object navigation maintains a hybrid representation consisting of a metric bird's eye view map combined with a semantic point cloud. The agent switches between a trained exploration policy and symbolic waypoint selection, deferring low-level actions to a symbolic planner.
  • Figure 3: The exploration policy takes as input EgoMaps $\mathbf{M}_t$ and predicts a heatmap, which is limited/masked ($\odot$) to unexplored areas. The next waypoint $\mathbf{p}_t$ is sampled ($\sim$) from the resulting map $\mathbf{H}_t$.
  • Figure 4: Coverage (%) obtained by the exploration policy as a function of episode length (the number of simulation steps), compared to ANS Chaplot2020Learning and end-to-end RL baselines using egocentric input taken from Chaplot2020Learning on Gibson/Val.
  • Figure 5: A rollout of an episode with the hybrid model. From left to right: (1) RGB observation; (2) GT map with the GT goal positions , the current agent position , the current waypoint $p_t$; (3) EgoMap $M_t$ with the planned local path and (4) a zoomed version. The initial goal is blue. At $t{=}6$, an exploration goal is predicted. The agent enters a new room, and at $t{=}17$ it detects the blue goal and switches to exploitation mode advancing towards it. At $t{=}23$, it observes the very dark green goal and maps it for future use. A false positive example (white cylinder) was also detected.
  • ...and 1 more figures