RISE: Self-Improving Robot Policy with Compositional World Model
Jiazhi Yang, Kunyang Lin, Jinwei Li, Wencong Zhang, Tianwei Lin, Longyan Wu, Zhizhong Su, Hao Zhao, Ya-Qin Zhang, Li Chen, Ping Luo, Xiangyu Yue, Hongyang Li
TL;DR
RISE enables on-policy robotic reinforcement learning through imagination by coupling a Compositional World Model—comprising a controllable dynamics model and a progress-aware value model—as an online learning environment. The framework warms up a real-world policy with offline data, then iteratively performs imagined rollouts and policy updates in the world model, using discretized advantages to guide learning. Across three real-world, dexterous tasks, RISE achieves substantial performance gains over strong baselines and ablations, demonstrating the viability of imagination-driven self-improvement for complex manipulation. The work highlights practical pathways to scalable, data-efficient robotic intelligence, while acknowledging limitations in realism, data balance, and compute costs, and suggesting directions for uncertainty-aware imagination and integration with physical constraints.
Abstract
Despite the sustained scaling on model capacity and data acquisition, Vision-Language-Action (VLA) models remain brittle in contact-rich and dynamic manipulation tasks, where minor execution deviations can compound into failures. While reinforcement learning (RL) offers a principled path to robustness, on-policy RL in the physical world is constrained by safety risk, hardware cost, and environment reset. To bridge this gap, we present RISE, a scalable framework of robotic reinforcement learning via imagination. At its core is a Compositional World Model that (i) predicts multi-view future via a controllable dynamics model, and (ii) evaluates imagined outcomes with a progress value model, producing informative advantages for the policy improvement. Such compositional design allows state and value to be tailored by best-suited yet distinct architectures and objectives. These components are integrated into a closed-loop self-improving pipeline that continuously generates imaginary rollouts, estimates advantages, and updates the policy in imaginary space without costly physical interaction. Across three challenging real-world tasks, RISE yields significant improvement over prior art, with more than +35% absolute performance increase in dynamic brick sorting, +45% for backpack packing, and +35% for box closing, respectively.
