D$^2$-World: An Efficient World Model through Decoupled Dynamic Flow

Haiming Zhang; Xu Yan; Ying Xue; Zixuan Guo; Shuguang Cui; Zhen Li; Bingbing Liu

D$^2$-World: An Efficient World Model through Decoupled Dynamic Flow

Haiming Zhang, Xu Yan, Ying Xue, Zixuan Guo, Shuguang Cui, Zhen Li, Bingbing Liu

TL;DR

D$^2$-World is introduced, a novel World model that effectively forecasts future point clouds through Decoupled Dynamic flow, and achieves state-of-the-art performance on the OpenScene Predictive World Model benchmark, securing second place, and trains more than 300% faster than the baseline model.

Abstract

This technical report summarizes the second-place solution for the Predictive World Model Challenge held at the CVPR-2024 Workshop on Foundation Models for Autonomous Systems. We introduce D$^2$-World, a novel World model that effectively forecasts future point clouds through Decoupled Dynamic flow. Specifically, the past semantic occupancies are obtained via existing occupancy networks (e.g., BEVDet). Following this, the occupancy results serve as the input for a single-stage world model, generating future occupancy in a non-autoregressive manner. To further simplify the task, dynamic voxel decoupling is performed in the world model. The model generates future dynamic voxels by warping the existing observations through voxel flow, while remaining static voxels can be easily obtained through pose transformation. As a result, our approach achieves state-of-the-art performance on the OpenScene Predictive World Model benchmark, securing second place, and trains more than 300% faster than the baseline model. Code is available at https://github.com/zhanghm1995/D2-World.

D$^2$-World: An Efficient World Model through Decoupled Dynamic Flow

TL;DR

-World is introduced, a novel World model that effectively forecasts future point clouds through Decoupled Dynamic flow, and achieves state-of-the-art performance on the OpenScene Predictive World Model benchmark, securing second place, and trains more than 300% faster than the baseline model.

Abstract

This technical report summarizes the second-place solution for the Predictive World Model Challenge held at the CVPR-2024 Workshop on Foundation Models for Autonomous Systems. We introduce D

-World, a novel World model that effectively forecasts future point clouds through Decoupled Dynamic flow. Specifically, the past semantic occupancies are obtained via existing occupancy networks (e.g., BEVDet). Following this, the occupancy results serve as the input for a single-stage world model, generating future occupancy in a non-autoregressive manner. To further simplify the task, dynamic voxel decoupling is performed in the world model. The model generates future dynamic voxels by warping the existing observations through voxel flow, while remaining static voxels can be easily obtained through pose transformation. As a result, our approach achieves state-of-the-art performance on the OpenScene Predictive World Model benchmark, securing second place, and trains more than 300% faster than the baseline model. Code is available at https://github.com/zhanghm1995/D2-World.

D$^2$-World: An Efficient World Model through Decoupled Dynamic Flow

TL;DR

Abstract

D$^2$-World: An Efficient World Model through Decoupled Dynamic Flow

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (2)