MotionCrafter: Dense Geometry and Motion Reconstruction with a 4D VAE
Ruijie Zhu, Jiahao Lu, Wenbo Hu, Xiaoguang Han, Jianfei Cai, Ying Shan, Chuanxia Zheng
TL;DR
MotionCrafter presents a diffusion-based framework that jointly reconstructs dense 4D geometry and dense scene motion from monocular video. It introduces a unified 4D latent by coupling Geometry VAE and Motion VAE, enabling end-to-end feed-forward reconstruction without post-optimization, and demonstrates strong improvements over state-of-the-art in both geometry and scene flow in world coordinates. A key finding is that strict alignment of 4D latent values to diffusion priors is not necessary; a relaxed normalization and two-stage VAE training suffice to leverage pre-trained video priors effectively. The approach yields robust, temporally coherent 4D reconstructions on diverse datasets, with practical implications for video understanding, robotics, and world-model learning, while also highlighting the potential for multi-modal extensions in future work.
Abstract
We introduce MotionCrafter, a video diffusion-based framework that jointly reconstructs 4D geometry and estimates dense motion from a monocular video. The core of our method is a novel joint representation of dense 3D point maps and 3D scene flows in a shared coordinate system, and a novel 4D VAE to effectively learn this representation. Unlike prior work that forces the 3D value and latents to align strictly with RGB VAE latents-despite their fundamentally different distributions-we show that such alignment is unnecessary and leads to suboptimal performance. Instead, we introduce a new data normalization and VAE training strategy that better transfers diffusion priors and greatly improves reconstruction quality. Extensive experiments across multiple datasets demonstrate that MotionCrafter achieves state-of-the-art performance in both geometry reconstruction and dense scene flow estimation, delivering 38.64% and 25.0% improvements in geometry and motion reconstruction, respectively, all without any post-optimization. Project page: https://ruijiezhu94.github.io/MotionCrafter_Page
