Imagination Policy: Using Generative Point Cloud Models for Learning Manipulation Policies
Haojie Huang, Karl Schmeckpeper, Dian Wang, Ondrej Biza, Yaoyao Qian, Haotian Liu, Mingxi Jia, Robert Platt, Robin Walters
TL;DR
Imagination Policy tackles high-precision robotic manipulation by replacing direct action inference with a generative approach that imagines target configurations from two input point clouds using a conditional point-flow model. Rigid motions are then recovered via point-cloud registration, yielding $SE(3)$ actions, with a bi-equivariant design that leverages task symmetries to improve sample efficiency and generalization. The method shows state-of-the-art performance on RLBench across challenging tasks and validates the approach on a real UR5 robot, while also providing ablations and extensions to longer-horizon and articulated-object scenarios. Limitations include reliance on segmented point clouds and diffusion-based inference speed, suggesting directions for faster inference and broader object categories in future work.
Abstract
Humans can imagine goal states during planning and perform actions to match those goals. In this work, we propose Imagination Policy, a novel multi-task key-frame policy network for solving high-precision pick and place tasks. Instead of learning actions directly, Imagination Policy generates point clouds to imagine desired states which are then translated to actions using rigid action estimation. This transforms action inference into a local generative task. We leverage pick and place symmetries underlying the tasks in the generation process and achieve extremely high sample efficiency and generalizability to unseen configurations. Finally, we demonstrate state-of-the-art performance across various tasks on the RLbench benchmark compared with several strong baselines and validate our approach on a real robot.
