Motion Dreamer: Boundary Conditional Motion Reasoning for Physically Coherent Video Generation
Tianshuo Xu, Zhifei Chen, Leyi Wu, Hao Lu, Yuying Chen, Lihui Jiang, Bingbing Liu, Yingcong Chen
TL;DR
Motion Dreamer tackles boundary conditional motion reasoning by explicitly separating motion inference from visual synthesis. It introduces instance flow to translate partial user cues into dense, physically coherent motion fields and employs a motion inpainting strategy to infer missing dynamics, guided by a two-stage diffusion+decoder pipeline built on CogVideoX. Evaluations on Physion and a large driving dataset show superior motion coherence and realism compared with state-of-the-art methods, with ablations validating the value of intermediate motion representations and motion-enhancement losses. The approach advances practical boundary-conditioned video generation for autonomous driving and embodied AI, with code and data forthcoming.
Abstract
Recent advances in video generation have shown promise for generating future scenarios, critical for planning and control in autonomous driving and embodied intelligence. However, real-world applications demand more than visually plausible predictions; they require reasoning about object motions based on explicitly defined boundary conditions, such as initial scene image and partial object motion. We term this capability Boundary Conditional Motion Reasoning. Current approaches either neglect explicit user-defined motion constraints, producing physically inconsistent motions, or conversely demand complete motion inputs, which are rarely available in practice. Here we introduce Motion Dreamer, a two-stage framework that explicitly separates motion reasoning from visual synthesis, addressing these limitations. Our approach introduces instance flow, a sparse-to-dense motion representation enabling effective integration of partial user-defined motions, and the motion inpainting strategy to robustly enable reasoning motions of other objects. Extensive experiments demonstrate that Motion Dreamer significantly outperforms existing methods, achieving superior motion plausibility and visual realism, thus bridging the gap towards practical boundary conditional motion reasoning. Our webpage is available: https://envision-research.github.io/MotionDreamer/.
