ForeRobo: Unlocking Infinite Simulation Data for 3D Goal-driven Robotic Manipulation
Dexin wang, Faliang Chang, Chunsheng Liu
TL;DR
ForeRobo introduces ForeGen and ForeFormer to unlock infinite, high-fidelity simulation data for 3D goal-state-driven robotic manipulation. ForeGen generates diverse tasks, scenes, and goal states via GPT-4 and CPCA-based state transfer, creating the ForeMani-v1 dataset. ForeFormer uses conditional diffusion with transformer-based encoders to predict per-point goal states from scene and task, enabling zero-shot sim-to-real with strong generalization across rigid and articulated objects. The approach achieves notable gains over baselines in simulation and real-world experiments and demonstrates scalable, interpretability-friendly planning-driven manipulation. Limitations include handling deformable objects and full preservation of structural details, guiding future work toward more general and robust manipulation in varied environments.
Abstract
Efficiently leveraging simulation to acquire advanced manipulation skills is both challenging and highly significant. We introduce \textit{ForeRobo}, a generative robotic agent that utilizes generative simulations to autonomously acquire manipulation skills driven by envisioned goal states. Instead of directly learning low-level policies, we advocate integrating generative paradigms with classical control. Our approach equips a robotic agent with a self-guided \textit{propose-generate-learn-actuate} cycle. The agent first proposes the skills to be acquired and constructs the corresponding simulation environments; it then configures objects into appropriate arrangements to generate skill-consistent goal states (\textit{ForeGen}). Subsequently, the virtually infinite data produced by ForeGen are used to train the proposed state generation model (\textit{ForeFormer}), which establishes point-wise correspondences by predicting the 3D goal position of every point in the current state, based on the scene state and task instructions. Finally, classical control algorithms are employed to drive the robot in real-world environments to execute actions based on the envisioned goal states. Compared with end-to-end policy learning methods, ForeFormer offers superior interpretability and execution efficiency. We train and benchmark ForeFormer across a variety of rigid-body and articulated-object manipulation tasks, and observe an average improvement of 56.32\% over the state-of-the-art state generation models, demonstrating strong generality across different manipulation patterns. Moreover, in real-world evaluations involving more than 20 robotic tasks, ForeRobo achieves zero-shot sim-to-real transfer and exhibits remarkable generalization capabilities, attaining an average success rate of 79.28\%.
