Motion2VecSets: 4D Latent Vector Set Diffusion for Non-rigid Shape Reconstruction and Tracking
Wei Cao, Chang Luo, Biao Zhang, Matthias Nießner, Jiapeng Tang
TL;DR
Motion2VecSets addresses the ill-posed problem of reconstructing 4D non-rigid surfaces from sparse, noisy, and partial point clouds by learning a probabilistic 4D prior through diffusion over latent sets. It introduces a 4D neural representation with a shape latent set for the reference frame and deformation latent sets for frame-to-frame motion, combined with synchronized diffusion via Interleaved Spatio-Temporal Attention to enforce spatio-temporal coherence and efficiency. The method demonstrates superior 4D reconstruction and completion on D-FAUST and DT4D-A datasets, including unseen identities and motions, and shows robustness to partial observations. Potential extensions include multi-modal 4D generation and text-driven 4D synthesis.
Abstract
We introduce Motion2VecSets, a 4D diffusion model for dynamic surface reconstruction from point cloud sequences. While existing state-of-the-art methods have demonstrated success in reconstructing non-rigid objects using neural field representations, conventional feed-forward networks encounter challenges with ambiguous observations from noisy, partial, or sparse point clouds. To address these challenges, we introduce a diffusion model that explicitly learns the shape and motion distribution of non-rigid objects through an iterative denoising process of compressed latent representations. The diffusion-based priors enable more plausible and probabilistic reconstructions when handling ambiguous inputs. We parameterize 4D dynamics with latent sets instead of using global latent codes. This novel 4D representation allows us to learn local shape and deformation patterns, leading to more accurate non-linear motion capture and significantly improving generalizability to unseen motions and identities. For more temporally-coherent object tracking, we synchronously denoise deformation latent sets and exchange information across multiple frames. To avoid computational overhead, we designed an interleaved space and time attention block to alternately aggregate deformation latents along spatial and temporal domains. Extensive comparisons against state-of-the-art methods demonstrate the superiority of our Motion2VecSets in 4D reconstruction from various imperfect observations. More detailed information can be found at https://vveicao.github.io/projects/Motion2VecSets/.
