Table of Contents
Fetching ...

MrCoM: A Meta-Regularized World-Model Generalizing Across Multi-Scenarios

Xuantang Xiong, Ni Mu, Runpeng Xie, Senhao Yang, Yaqing Wang, Lexiang Wang, Yao Luan, Siyuan Li, Shuang Xu, Yiqin Yang, Bo Xu

TL;DR

MrCoM introduces a Meta-Regularized Contextual World-Model to generalize across multi-scenario reinforcement learning tasks. It decomposes the latent state into stochastic, deterministic, and auxiliary components and employs meta-state and meta-value regularization to extract scenario-relevant information and align model optimization with policy learning. A theoretical generalization bound in Meta-POMDP settings is derived, and extensive experiments on DMControl/MuJoCo benchmarks show superior cross-scenario performance compared with DreamerV3, CaDM, and MAMBA. The approach targets three error sources—dynamics, state representation, and policy differences—demonstrating robust cross-domain planning with a single, unified world-model. Key contributions include the latent state factorization, the two regularization mechanisms, a formal generalization bound, and comprehensive multi-scenario evaluations that reveal strong cross-scenario transfer and resilience to dynamic and observation changes.

Abstract

Model-based reinforcement learning (MBRL) is a crucial approach to enhance the generalization capabilities and improve the sample efficiency of RL algorithms. However, current MBRL methods focus primarily on building world models for single tasks and rarely address generalization across different scenarios. Building on the insight that dynamics within the same simulation engine share inherent properties, we attempt to construct a unified world model capable of generalizing across different scenarios, named Meta-Regularized Contextual World-Model (MrCoM). This method first decomposes the latent state space into various components based on the dynamic characteristics, thereby enhancing the accuracy of world-model prediction. Further, MrCoM adopts meta-state regularization to extract unified representation of scenario-relevant information, and meta-value regularization to align world-model optimization with policy learning across diverse scenario objectives. We theoretically analyze the generalization error upper bound of MrCoM in multi-scenario settings. We systematically evaluate our algorithm's generalization ability across diverse scenarios, demonstrating significantly better performance than previous state-of-the-art methods.

MrCoM: A Meta-Regularized World-Model Generalizing Across Multi-Scenarios

TL;DR

MrCoM introduces a Meta-Regularized Contextual World-Model to generalize across multi-scenario reinforcement learning tasks. It decomposes the latent state into stochastic, deterministic, and auxiliary components and employs meta-state and meta-value regularization to extract scenario-relevant information and align model optimization with policy learning. A theoretical generalization bound in Meta-POMDP settings is derived, and extensive experiments on DMControl/MuJoCo benchmarks show superior cross-scenario performance compared with DreamerV3, CaDM, and MAMBA. The approach targets three error sources—dynamics, state representation, and policy differences—demonstrating robust cross-domain planning with a single, unified world-model. Key contributions include the latent state factorization, the two regularization mechanisms, a formal generalization bound, and comprehensive multi-scenario evaluations that reveal strong cross-scenario transfer and resilience to dynamic and observation changes.

Abstract

Model-based reinforcement learning (MBRL) is a crucial approach to enhance the generalization capabilities and improve the sample efficiency of RL algorithms. However, current MBRL methods focus primarily on building world models for single tasks and rarely address generalization across different scenarios. Building on the insight that dynamics within the same simulation engine share inherent properties, we attempt to construct a unified world model capable of generalizing across different scenarios, named Meta-Regularized Contextual World-Model (MrCoM). This method first decomposes the latent state space into various components based on the dynamic characteristics, thereby enhancing the accuracy of world-model prediction. Further, MrCoM adopts meta-state regularization to extract unified representation of scenario-relevant information, and meta-value regularization to align world-model optimization with policy learning across diverse scenario objectives. We theoretically analyze the generalization error upper bound of MrCoM in multi-scenario settings. We systematically evaluate our algorithm's generalization ability across diverse scenarios, demonstrating significantly better performance than previous state-of-the-art methods.

Paper Structure

This paper contains 40 sections, 6 theorems, 36 equations, 3 figures, 6 tables, 2 algorithms.

Key Result

Lemma 3

Given the state representation error $\epsilon_{S}$ and dynamics model error $\epsilon_{T}$, the upper bound of the dynamics error under the state representation is: Here, $C_T = \max _s \nabla_s \sum_a T(s^\prime \mid s, a)$ denotes the maximum derivative of the dynamics function concerning $s$, representing the sensitivity of state changes to the state transition function.

Figures (3)

  • Figure 1: Framework of MrCoM, which merges multi-scenario data into a unified world-model. The meta-state regularization extracts scenario-relevant information, and meta-value regularization aligns world-model optimization with policy learning.
  • Figure 2: Model architecture of algorithms. Compared to other methods, MrCoM implements a refined partitioning of the latent state space. In this Figure, $\tilde{s}$ denotes the latent state, $d$ the deterministic latent state, $u$ the stochastic latent state, and $h$ the auxiliary state. In MrCoM, the latent state $\tilde{s}_t$ is composed of the concatenation of $u_t$, $d_t$, and $h_t$.
  • Figure 3: Module ablation studies on various scenarios with five random seeds.

Theorems & Definitions (14)

  • Lemma 3
  • proof
  • Lemma 4
  • proof
  • Lemma 5
  • proof
  • Theorem 6
  • proof
  • proof
  • proof
  • ...and 4 more