AdaptDiffuser: Diffusion Models as Adaptive Self-evolving Planners
Zhixuan Liang, Yao Mu, Mingyu Ding, Fei Ni, Masayoshi Tomizuka, Ping Luo
TL;DR
AdaptDiffuser introduces a self-evolving diffusion-based planner for offline RL that generates diverse, high-quality synthetic demonstrations guided by reward gradients, filters them with a discriminator, and fine-tunes the diffusion model to improve planning on seen tasks and generalize to unseen tasks without extra expert data. By incorporating reward-to-go conditioning and dynamics-consistency constraints, it achieves notable gains over prior diffusion planners on Maze2D and MuJoCo benchmarks and demonstrates zero-shot adaptation to new tasks like KUKA pick-and-place. The work includes extensive ablations on iterative data generation, data sufficiency, and model size, and discusses practical considerations such as training-time costs and potential extensions to high-dimensional observations and diverse maze layouts. Overall, AdaptDiffuser provides a robust framework for adaptive, task-general diffusion-based planning in offline settings with meaningful real-world implications for autonomous robots and goal-conditioned control.
Abstract
Diffusion models have demonstrated their powerful generative capability in many tasks, with great potential to serve as a paradigm for offline reinforcement learning. However, the quality of the diffusion model is limited by the insufficient diversity of training data, which hinders the performance of planning and the generalizability to new tasks. This paper introduces AdaptDiffuser, an evolutionary planning method with diffusion that can self-evolve to improve the diffusion model hence a better planner, not only for seen tasks but can also adapt to unseen tasks. AdaptDiffuser enables the generation of rich synthetic expert data for goal-conditioned tasks using guidance from reward gradients. It then selects high-quality data via a discriminator to finetune the diffusion model, which improves the generalization ability to unseen tasks. Empirical experiments on two benchmark environments and two carefully designed unseen tasks in KUKA industrial robot arm and Maze2D environments demonstrate the effectiveness of AdaptDiffuser. For example, AdaptDiffuser not only outperforms the previous art Diffuser by 20.8% on Maze2D and 7.5% on MuJoCo locomotion, but also adapts better to new tasks, e.g., KUKA pick-and-place, by 27.9% without requiring additional expert data. More visualization results and demo videos could be found on our project page.
