Table of Contents
Fetching ...

Closing the Train-Test Gap in World Models for Gradient-Based Planning

Arjun Parthasarathy, Nimit Kalra, Rohun Agrawal, Yann LeCun, Oumayma Bounou, Pavel Izmailov, Micah Goldblum

TL;DR

The paper tackles the limited effectiveness of gradient-based planning (GBP) when using world models trained only on one-step next-state predictions. It introduces Online World Modeling and Adversarial World Modeling to widen the training distribution and smooth the optimization landscape, enabling GBP to approach or exceed the performance of gradient-free methods like CEM while dramatically reducing computation time. Across object manipulation and navigation tasks, the methods substantially close the train-test gap, improve planning stability, and deliver orders-of-magnitude faster planning. The work highlights the practical viability of GBP with robust, real-time planning in high-dimensional latent spaces and suggests extensions to real-world and hierarchical world-model architectures.

Abstract

World models paired with model predictive control (MPC) can be trained offline on large-scale datasets of expert trajectories and enable generalization to a wide range of planning tasks at inference time. Compared to traditional MPC procedures, which rely on slow search algorithms or on iteratively solving optimization problems exactly, gradient-based planning offers a computationally efficient alternative. However, the performance of gradient-based planning has thus far lagged behind that of other approaches. In this paper, we propose improved methods for training world models that enable efficient gradient-based planning. We begin with the observation that although a world model is trained on a next-state prediction objective, it is used at test-time to instead estimate a sequence of actions. The goal of our work is to close this train-test gap. To that end, we propose train-time data synthesis techniques that enable significantly improved gradient-based planning with existing world models. At test time, our approach outperforms or matches the classical gradient-free cross-entropy method (CEM) across a variety of object manipulation and navigation tasks in 10% of the time budget.

Closing the Train-Test Gap in World Models for Gradient-Based Planning

TL;DR

The paper tackles the limited effectiveness of gradient-based planning (GBP) when using world models trained only on one-step next-state predictions. It introduces Online World Modeling and Adversarial World Modeling to widen the training distribution and smooth the optimization landscape, enabling GBP to approach or exceed the performance of gradient-free methods like CEM while dramatically reducing computation time. Across object manipulation and navigation tasks, the methods substantially close the train-test gap, improve planning stability, and deliver orders-of-magnitude faster planning. The work highlights the practical viability of GBP with robust, real-time planning in high-dimensional latent spaces and suggests extensions to real-world and hierarchical world-model architectures.

Abstract

World models paired with model predictive control (MPC) can be trained offline on large-scale datasets of expert trajectories and enable generalization to a wide range of planning tasks at inference time. Compared to traditional MPC procedures, which rely on slow search algorithms or on iteratively solving optimization problems exactly, gradient-based planning offers a computationally efficient alternative. However, the performance of gradient-based planning has thus far lagged behind that of other approaches. In this paper, we propose improved methods for training world models that enable efficient gradient-based planning. We begin with the observation that although a world model is trained on a next-state prediction objective, it is used at test-time to instead estimate a sequence of actions. The goal of our work is to close this train-test gap. To that end, we propose train-time data synthesis techniques that enable significantly improved gradient-based planning with existing world models. At test time, our approach outperforms or matches the classical gradient-free cross-entropy method (CEM) across a variety of object manipulation and navigation tasks in 10% of the time budget.

Paper Structure

This paper contains 38 sections, 11 equations, 11 figures, 11 tables, 5 algorithms.

Figures (11)

  • Figure 1: An overview of our two proposed methods. When planning with a world model, actions may result in trajectories that lie outside the distribution of expert trajectories on which the world model was trained, leading to inaccurate world modeling. Online World Modeling finetunes a pretrained world model by using the simulator to correct trajectories produced via gradient-based planning, leading to accurate world modeling beyond the expert trajectory distribution. Adversarial World Modeling finetunes a world model on perturbations of actions and expert trajectories, promoting robustness and smoothing the world model's input gradients.
  • Figure 2: Optimization landscape of DINO-WM zhou2025dinowmworldmodelspretrained before and after finetuning with our Adversarial World Modeling objective on the Push-T task. Adversarial World Modeling yields a smoother landscape with a broader basin around the optimum. Visualization details in \ref{['sec:visualization']}.
  • Figure 3: Planning efficiency of DINO-WM, Online World Modeling, and Adversarial World Modeling on the PushT task. Gradient-based planning is orders of magnitude faster than CEM.
  • Figure 4: Difference in World Model Error between expert and planning trajectories on PushT.
  • Figure 5: Illustrations of the three tasks used in our main experiments and the two robotic manipulation tasks we further study in \ref{['sec:robotics']}. Images from zhou2025dinowmworldmodelspretrained.
  • ...and 6 more figures