CabiNet: Scaling Neural Collision Detection for Object Rearrangement with Procedural Scene Generation

Adithyavairavan Murali; Arsalan Mousavian; Clemens Eppner; Adam Fishman; Dieter Fox

CabiNet: Scaling Neural Collision Detection for Object Rearrangement with Procedural Scene Generation

Adithyavairavan Murali, Arsalan Mousavian, Clemens Eppner, Adam Fishman, Dieter Fox

TL;DR

CabiNet tackles robotic rearrangement in clutter without explicit object models by learning a fast 3D collision predictor from partial point clouds trained on 650K procedurally generated scenes. It introduces an implicit 3D scene encoder and a SDF-based waypoint sampler, enabling collision-aware planning and tight-space navigation, informed by MPPI trajectories. The approach demonstrates strong sim-to-real transfer, achieving high collision detection accuracy and improved rearrangement performance in both simulated and real-world experiments. This work significantly improves scalable neural rearrangement by reducing data and modeling requirements while maintaining robust performance in unknown environments.

Abstract

We address the important problem of generalizing robotic rearrangement to clutter without any explicit object models. We first generate over 650K cluttered scenes - orders of magnitude more than prior work - in diverse everyday environments, such as cabinets and shelves. We render synthetic partial point clouds from this data and use it to train our CabiNet model architecture. CabiNet is a collision model that accepts object and scene point clouds, captured from a single-view depth observation, and predicts collisions for SE(3) object poses in the scene. Our representation has a fast inference speed of 7 microseconds per query with nearly 20% higher performance than baseline approaches in challenging environments. We use this collision model in conjunction with a Model Predictive Path Integral (MPPI) planner to generate collision-free trajectories for picking and placing in clutter. CabiNet also predicts waypoints, computed from the scene's signed distance field (SDF), that allows the robot to navigate tight spaces during rearrangement. This improves rearrangement performance by nearly 35% compared to baselines. We systematically evaluate our approach, procedurally generate simulated experiments, and demonstrate that our approach directly transfers to the real world, despite training exclusively in simulation. Robot experiment demos in completely unknown scenes and objects can be found at this http https://cabinet-object-rearrangement.github.io

CabiNet: Scaling Neural Collision Detection for Object Rearrangement with Procedural Scene Generation

TL;DR

Abstract

Paper Structure (11 sections, 1 equation, 6 figures, 5 tables)

This paper contains 11 sections, 1 equation, 6 figures, 5 tables.

Introduction
Related Work
Procedural Data Generation
Neural Rearrangement Planning
Experimental Evaluation
Evaluation on Collision Benchmark
Object Rearrangement Evaluation in Simulation
Real Robot Experiments
Conclusion
Additional Examples of Synthetic Data and Scenes
Trajectory Generation

Figures (6)

Figure 1: CabiNet is able to (right) perform complex rearrangement tasks in novel, cluttered scenes on the real robot from just partial point cloud observations without object or environment models. The model is trained with over 650K procedurally generated synthetic scenes (left).
Figure 2: An example of a procedurally generated CabiNet rearrangement scene. The target object (here in purple) is chosen if it has a selection of valid collision-free grasp poses. The green region represents the placement shelf, which is chosen if a) it is different from the shelf the object originates from and b) has a valid placement pose for the target object.
Figure 3: Our CabiNet architecture first encodes the scene point cloud with voxelization and 3D convolutions, shown in the top. The robot is only used for visualization and the robot point cloud is removed from the scene in practice. The scene features are then used with the object features to predict scene-object collision queries. We also predict waypoints (points colored in blue) for rearrangement, conditioned on latent vector $z$ and the current gripper position (shown in green).
Figure 4: CabiNet is used for pick-and-place tasks in the real world. The scenes in the middle and the right are out-of-distribution environments in a real IKEA kitchen.
Figure 5: Examples of failure cases, left: roof partially occluded leading to collision with wrist camera during grasping, middle: grasped object collided with barrier, right: pick failure.
...and 1 more figures

CabiNet: Scaling Neural Collision Detection for Object Rearrangement with Procedural Scene Generation

TL;DR

Abstract

CabiNet: Scaling Neural Collision Detection for Object Rearrangement with Procedural Scene Generation

Authors

TL;DR

Abstract

Table of Contents

Figures (6)