Strengthening Generative Robot Policies through Predictive World Modeling
Han Qi, Haocheng Yin, Aris Zhu, Yilun Du, Heng Yang
TL;DR
The paper addresses robustness gaps in robotic control by unifying a diffusion-based generative policy with a predictive, action-conditioned world model, enabling online planning. GPC builds three components—policy learning, world-model learning from expert and exploration data, and two online planners (GPC-RANK and GPC-OPT)—and optionally uses a reward predictor or zero-shot vision-language models for task guidance. Across state-based and vision-based tasks in simulation and real-world settings, GPC significantly outperforms pure behavior cloning, with gains amplified by multiple action proposals and gradient-based optimization, and sometimes approaching a ground-truth simulator's planning performance. The work demonstrates that random exploration data is crucial for world-model accuracy and shows the approach can leverage foundation-models to handle diverse rewards and visual dynamics, marking a practical step toward robust, flexible robotic control.
Abstract
We present generative predictive control (GPC), a learning control framework that (i) clones a generative diffusion-based policy from expert demonstrations, (ii) trains a predictive action-conditioned world model from both expert demonstrations and random explorations, and (iii) synthesizes an online planner that ranks and optimizes the action proposals from (i) by looking ahead into the future using the world model from (ii). Across a variety of robotic manipulation tasks, we demonstrate that GPC consistently outperforms behavior cloning in both state-based and vision-based settings, in simulation and in the real world.
