Table of Contents
Fetching ...

RISE: Self-Improving Robot Policy with Compositional World Model

Jiazhi Yang, Kunyang Lin, Jinwei Li, Wencong Zhang, Tianwei Lin, Longyan Wu, Zhizhong Su, Hao Zhao, Ya-Qin Zhang, Li Chen, Ping Luo, Xiangyu Yue, Hongyang Li

TL;DR

RISE enables on-policy robotic reinforcement learning through imagination by coupling a Compositional World Model—comprising a controllable dynamics model and a progress-aware value model—as an online learning environment. The framework warms up a real-world policy with offline data, then iteratively performs imagined rollouts and policy updates in the world model, using discretized advantages to guide learning. Across three real-world, dexterous tasks, RISE achieves substantial performance gains over strong baselines and ablations, demonstrating the viability of imagination-driven self-improvement for complex manipulation. The work highlights practical pathways to scalable, data-efficient robotic intelligence, while acknowledging limitations in realism, data balance, and compute costs, and suggesting directions for uncertainty-aware imagination and integration with physical constraints.

Abstract

Despite the sustained scaling on model capacity and data acquisition, Vision-Language-Action (VLA) models remain brittle in contact-rich and dynamic manipulation tasks, where minor execution deviations can compound into failures. While reinforcement learning (RL) offers a principled path to robustness, on-policy RL in the physical world is constrained by safety risk, hardware cost, and environment reset. To bridge this gap, we present RISE, a scalable framework of robotic reinforcement learning via imagination. At its core is a Compositional World Model that (i) predicts multi-view future via a controllable dynamics model, and (ii) evaluates imagined outcomes with a progress value model, producing informative advantages for the policy improvement. Such compositional design allows state and value to be tailored by best-suited yet distinct architectures and objectives. These components are integrated into a closed-loop self-improving pipeline that continuously generates imaginary rollouts, estimates advantages, and updates the policy in imaginary space without costly physical interaction. Across three challenging real-world tasks, RISE yields significant improvement over prior art, with more than +35% absolute performance increase in dynamic brick sorting, +45% for backpack packing, and +35% for box closing, respectively.

RISE: Self-Improving Robot Policy with Compositional World Model

TL;DR

RISE enables on-policy robotic reinforcement learning through imagination by coupling a Compositional World Model—comprising a controllable dynamics model and a progress-aware value model—as an online learning environment. The framework warms up a real-world policy with offline data, then iteratively performs imagined rollouts and policy updates in the world model, using discretized advantages to guide learning. Across three real-world, dexterous tasks, RISE achieves substantial performance gains over strong baselines and ablations, demonstrating the viability of imagination-driven self-improvement for complex manipulation. The work highlights practical pathways to scalable, data-efficient robotic intelligence, while acknowledging limitations in realism, data balance, and compute costs, and suggesting directions for uncertainty-aware imagination and integration with physical constraints.

Abstract

Despite the sustained scaling on model capacity and data acquisition, Vision-Language-Action (VLA) models remain brittle in contact-rich and dynamic manipulation tasks, where minor execution deviations can compound into failures. While reinforcement learning (RL) offers a principled path to robustness, on-policy RL in the physical world is constrained by safety risk, hardware cost, and environment reset. To bridge this gap, we present RISE, a scalable framework of robotic reinforcement learning via imagination. At its core is a Compositional World Model that (i) predicts multi-view future via a controllable dynamics model, and (ii) evaluates imagined outcomes with a progress value model, producing informative advantages for the policy improvement. Such compositional design allows state and value to be tailored by best-suited yet distinct architectures and objectives. These components are integrated into a closed-loop self-improving pipeline that continuously generates imaginary rollouts, estimates advantages, and updates the policy in imaginary space without costly physical interaction. Across three challenging real-world tasks, RISE yields significant improvement over prior art, with more than +35% absolute performance increase in dynamic brick sorting, +45% for backpack packing, and +35% for box closing, respectively.
Paper Structure (40 sections, 8 equations, 19 figures, 10 tables)

This paper contains 40 sections, 8 equations, 19 figures, 10 tables.

Figures (19)

  • Figure 1: We present RISE, a framework for Reinforcement learning via Imagination for SElf-improving robots. (a) Conventional physical-world RL is bottlenecked by hardware cost, slow serial interaction, and the need for manual reset. (b) RISE shifts the learning environment to a Compositional World Model, which first emulates future observations for proposed actions, then evaluates imagined states to derive advantage for policy improvement. (c) Training on massive imaginative rollouts effectively bootstraps RISE's performance across a variety of complex, contact-rich tasks, surpassing prior art by a non-trivial margin.
  • Figure 2: Evaluation task suite of RISE.Left: Tabletop setting. Right: Zoomed-in details of each task procedure. Dynamic Brick Sorting involves precisely picking up colored bricks from a moving conveyor and placing them into the corresponding color-designated bins. Backpack Packing requires the robot to open, insert clothes, lift, and zip the backpack. Box Closing necessitates subtle controls to fold the flap and tuck the tab into the box precisely.
  • Figure 3: Qualitative imaginations produced by RISE. Given initial multi-view context and candidate action chunks, RISE can (a) emulate a variety of future accordingly, (b) simulate failure cases with corresponding reward drops, and (c) maintain coherent predictions consistent with real executions.
  • Figure 4: Workflow of compositional world model. Top: Training recipe upon proper model initialization. Bottom: Inference pipeline that yields rewarded samples for policy optimization. Both modules are compatible with multi-view images. We omit text prompt for both policy and value model for brevity.
  • Figure 5: Self-improving loop of RISE. Our learning pipeline encompasses two stages. Top: Rollout stage. Prompted with an optimal advantage, the rollout policy interacts with the world model to produce rollout data. Bottom: Training stage. The behavior policy is then trained to generate proper action under an advantage-conditioning scheme.
  • ...and 14 more figures