ClutterGen: A Cluttered Scene Generator for Robot Learning

Yinsen Jia; Boyuan Chen

ClutterGen: A Cluttered Scene Generator for Robot Learning

Yinsen Jia, Boyuan Chen

TL;DR

ClutterGen reframes the generation of diverse, physically plausible cluttered scenes for robot learning as a closed-loop reinforcement learning problem with physics-based rewards, enabling autonomous generation without curated datasets. It employs an auto-regressive policy that places objects sequentially inside a queried region, using a Beta-distributed action output and dual encoders for geometry and history to promote diversity and stability. The authors validate the approach in simulation and on real robots, demonstrating clutter rearrangement and stable object placement, and show that ClutterGen can serve as a fast, synthetic data generator for robust sim-to-real transfer. Overall, ClutterGen provides a scalable path to open-ended scene generation that enhances robustness and efficiency in robotic manipulation Research.

Abstract

We introduce ClutterGen, a physically compliant simulation scene generator capable of producing highly diverse, cluttered, and stable scenes for robot learning. Generating such scenes is challenging as each object must adhere to physical laws like gravity and collision. As the number of objects increases, finding valid poses becomes more difficult, necessitating significant human engineering effort, which limits the diversity of the scenes. To overcome these challenges, we propose a reinforcement learning method that can be trained with physics-based reward signals provided by the simulator. Our experiments demonstrate that ClutterGen can generate cluttered object layouts with up to ten objects on confined table surfaces. Additionally, our policy design explicitly encourages the diversity of the generated scenes for open-ended generation. Our real-world robot results show that ClutterGen can be directly used for clutter rearrangement and stable placement policy training.

ClutterGen: A Cluttered Scene Generator for Robot Learning

TL;DR

Abstract

Paper Structure (12 sections, 4 equations, 12 figures, 5 tables)

This paper contains 12 sections, 4 equations, 12 figures, 5 tables.

Introduction
Related Work
The ClutterGen Framework
Problem Formulation
Cluttered Scene Generator
Implementation Details
Experiments
Scene Generation Evaluation
Scene-level Generalization
Real Robotics Task: Clutter Rearrangement
Real Robotics Task: Stable Object Placement
Conclusion, Limitations, and Future Work

Figures (12)

Figure 1: (a) The success rate of generating a stable simulation setup. When the number of objects in the environment increases, the difficulty of creating such a stable setup also increases. The traditional heuristic method cannot create a simulation scene above 7 objects, while ClutterGen consistently achieves high success rates. (b) Diverse, cluttered, and stable simulation setups created by ClutterGen.
Figure 2: ClutterGen. We stack a sequence of attempt histories as input to the history sequence encoder to generate the history feature. This feature, combined with the perception feature from the point cloud encoder, is taken by ClutterGen to output the placement pose for the queried object. The simulator evaluates the placement's stability, determining whether to proceed to the next attempt or the next queried object placement.
Figure 3: Average stable steps across different numbers of attempts. We computed the average stable steps for object placements requiring $\geq$3 attempts. The x-axis represents the $i_{th}$ attempt, and the y-axis represents the average simulation steps for the object to stabilize. ClutterGen's closed-loop re-attempt mechanism increasingly improves placement stability.
Figure 4: Example of the closed-loop generation process of ClutterGen. ClutterGen attempts to place a camera on a cluttered table. Failed placement attempts (e.g., floating in the air, colliding with objects, or inserting into the table) are marked by red circles. After each attempt, the simulator runs and records the movement trajectory. ClutterGen uses all previous failed attempts to adjust its future actions until achieving a successful placement.
Figure 5: Generation diversity. A projected view of the queried object's placements. The black dashed line represents the supporting surface area. The blue dots are queried object placement positions $(x, y)$ across 500 setups. The red box is the coverage area, bounding all placement positions. Beta distribution greatly enhances scene diversity.
...and 7 more figures

ClutterGen: A Cluttered Scene Generator for Robot Learning

TL;DR

Abstract

ClutterGen: A Cluttered Scene Generator for Robot Learning

Authors

TL;DR

Abstract

Table of Contents

Figures (12)