Table of Contents
Fetching ...

NeRP: Neural Rearrangement Planning for Unknown Objects

Ahmed H. Qureshi, Arsalan Mousavian, Chris Paxton, Michael C. Yip, Dieter Fox

TL;DR

NeRP tackles long-horizon rearrangement of unknown objects by modeling the scene as a graph of segmented objects and using a set-based, multi-network planner to predict and execute sequences of pick-and-place actions. The system is trained in simulation and demonstrated to generalize to real-world tasks, outperforming model-based and heuristic baselines in unseen scenarios. Key contributions include a graph-encoder with k-GNNs, an object-selection network, a stochastic delta-proposal network, a goal-satisfaction evaluator, and a collision detector, all integrated through a model-predictive, sampling-based planning loop. Despite sim-to-real challenges due to perception noise, NeRP shows strong generalization to different object counts and unseen rearrangements, with ablations highlighting the importance of stochasticity and component cooperation.

Abstract

Robots will be expected to manipulate a wide variety of objects in complex and arbitrary ways as they become more widely used in human environments. As such, the rearrangement of objects has been noted to be an important benchmark for AI capabilities in recent years. We propose NeRP (Neural Rearrangement Planning), a deep learning based approach for multi-step neural object rearrangement planning which works with never-before-seen objects, that is trained on simulation data, and generalizes to the real world. We compare NeRP to several naive and model-based baselines, demonstrating that our approach is measurably better and can efficiently arrange unseen objects in fewer steps and with less planning time. Finally, we demonstrate it on several challenging rearrangement problems in the real world.

NeRP: Neural Rearrangement Planning for Unknown Objects

TL;DR

NeRP tackles long-horizon rearrangement of unknown objects by modeling the scene as a graph of segmented objects and using a set-based, multi-network planner to predict and execute sequences of pick-and-place actions. The system is trained in simulation and demonstrated to generalize to real-world tasks, outperforming model-based and heuristic baselines in unseen scenarios. Key contributions include a graph-encoder with k-GNNs, an object-selection network, a stochastic delta-proposal network, a goal-satisfaction evaluator, and a collision detector, all integrated through a model-predictive, sampling-based planning loop. Despite sim-to-real challenges due to perception noise, NeRP shows strong generalization to different object counts and unseen rearrangements, with ablations highlighting the importance of stochasticity and component cooperation.

Abstract

Robots will be expected to manipulate a wide variety of objects in complex and arbitrary ways as they become more widely used in human environments. As such, the rearrangement of objects has been noted to be an important benchmark for AI capabilities in recent years. We propose NeRP (Neural Rearrangement Planning), a deep learning based approach for multi-step neural object rearrangement planning which works with never-before-seen objects, that is trained on simulation data, and generalizes to the real world. We compare NeRP to several naive and model-based baselines, demonstrating that our approach is measurably better and can efficiently arrange unseen objects in fewer steps and with less planning time. Finally, we demonstrate it on several challenging rearrangement problems in the real world.

Paper Structure

This paper contains 15 sections, 9 equations, 5 figures, 3 tables, 1 algorithm.

Figures (5)

  • Figure 1: Model architecture overview. We use UCN xiang2020learning to segment out unique objects in the scene, and then compute latent embeddings $\boldsymbol{w}$ for each object alignment in the current and target observation for scene graph generation. Our graph encoder network $f_\Theta$ computes the graph embeddings. Then, at each planning step, we use the object selector $h_\Phi$ to choose which object to move, and the $\delta$-proposal network $\pi_\Omega$ to generate candidate motions. The goal satisfaction network $r_\Psi$ predicts whether or not individual configurations will satisfy the objective, and the collision detection network rejects particular invalid proposals.
  • Figure 2: Examples of generated data. Objects are randomly placed on the table, and we chose different random motions as well.
  • Figure 3: An example plan rollout showing how NeRP chose to move objects around in order to get between two goal states with very different arrangements of obstacles. In this case, it took 10 steps to get to the goal state.
  • Figure 4: Example of a planning sequence. The robot repeatedly selects which object to move and either moves it to the appropriate goal position or to a storage position in order to enable future execution.
  • Figure 5: Swapping an unseen mug and bowl using NeRP: For the given $X(\mathrm{Start})$ and $X(\mathrm{End})$ arrangements, NeRP generates an encoded scene graph using which the object selection network ($h_{\Phi}$) selects an object in the given scenarios (e.g., 1, 3 & 5). The $\delta$-proposal network ($\pi_\Omega$) predicts $\boldsymbol{\delta} \in \Delta$ for the selected object, leading to its next placement region with a cost map $c_{map}$ ( e.g., 2, 4 & 6). The $c_{map}$ is originated based on the goal satisfaction network's ($r_\Psi$) scores, $\boldsymbol{v}$, to indicate the robot with the best placement locations during execution.