One Model for All Tasks: Leveraging Efficient World Models in Multi-Task Planning

Yuan Pu; Yazhe Niu; Jia Tang; Junyu Xiong; Shuai Hu; Hongsheng Li

One Model for All Tasks: Leveraging Efficient World Models in Multi-Task Planning

Yuan Pu, Yazhe Niu, Jia Tang, Junyu Xiong, Shuai Hu, Hongsheng Li

TL;DR

This work systematically investigates key architectural designs for extending UniZero and identifies a Mixture-of-Experts (MoE) architecture as the most effective approach, and introduces an online Dynamic Parameter Scaling (DPS) strategy to dynamically allocate model capacity throughout the learning process.

Abstract

In heterogeneous multi-task decision-making, tasks not only exhibit diverse observation and action spaces but also vary substantially in their underlying complexities. While conventional multi-task world models like UniZero excel in single-task settings, we find that when handling a broad and diverse suite of tasks, gradient conflicts and the loss of model plasticity often constrain their sample efficiency. In this work, we address these challenges from two complementary perspectives: the single learning iteration and the overall learning process. First, to mitigate the gradient conflicts, we systematically investigate key architectural designs for extending UniZero. Our investigation identifies a Mixture-of-Experts (MoE) architecture as the most effective approach. We demonstrate, both theoretically and empirically, that this architecture alleviates gradient conflicts by routing task-specific representations to specialized sub-networks. This finding leads to our proposed model, \textit{ScaleZero}. Second, to dynamically allocate model capacity throughout the learning process, we introduce an online Dynamic Parameter Scaling (DPS) strategy. This strategy progressively integrates LoRA adapters in response to task-specific progress, enabling adaptive knowledge retention and parameter expansion. Evaluations on a diverse set of standard benchmarks (Atari, DMC, Jericho) demonstrate that ScaleZero, utilizing solely online reinforcement learning with one model, performs on par with specialized single-task agents. With the DPS strategy, it remains competitive while using just 71.5% of the environment interactions. These findings underscore the potential of ScaleZero for effective multi-task planning. Our code is available at \textcolor{magenta}{https://github.com/opendilab/LightZero}.

One Model for All Tasks: Leveraging Efficient World Models in Multi-Task Planning

TL;DR

Abstract

One Model for All Tasks: Leveraging Efficient World Models in Multi-Task Planning

TL;DR

Abstract

Paper Structure

Table of Contents

Key Result

Figures (22)

Theorems & Definitions (10)