WorldPlanner: Monte Carlo Tree Search and MPC with Action-Conditioned Visual World Models
Authors
R. Khorrambakht, Joaquim Ortiz-Haro, Joseph Amigo, Omar Mostafa, Daniel Dugas, Franziska Meier, Ludovic Righetti
Abstract
Robots must understand their environment from raw sensory inputs and reason
about the consequences of their actions in it to solve complex tasks. Behavior
Cloning (BC) leverages task-specific human demonstrations to learn this
knowledge as end-to-end policies. However, these policies are difficult to
transfer to new tasks, and generating training data is challenging because it
requires careful demonstrations and frequent environment resets. In contrast to
such policy-based view, in this paper we take a model-based approach where we
collect a few hours of unstructured easy-to-collect play data to learn an
action-conditioned visual world model, a diffusion-based action sampler, and
optionally a reward model. The world model -- in combination with the action
sampler and a reward model -- is then used to optimize long sequences of
actions with a Monte Carlo Tree Search (MCTS) planner. The resulting plans are
executed on the robot via a zeroth-order Model Predictive Controller (MPC). We
show that the action sampler mitigates hallucinations of the world model during
planning and validate our approach on 3 real-world robotic tasks with varying
levels of planning and modeling complexity. Our experiments support the
hypothesis that planning leads to a significant improvement over BC baselines
on a standard manipulation test environment.