DPoser: Diffusion Model as Robust 3D Human Pose Prior
Junzhe Lu, Jing Lin, Hongkun Dou, Ailing Zeng, Yue Deng, Yulun Zhang, Haoqian Wang
TL;DR
DPoser introduces an unconditional diffusion-based 3D human pose prior trained on SMPL pose representations and deployed as a versatile regularizer within inverse-problem formulations for pose-related tasks. By employing variational diffusion sampling and a test-time truncated timestep scheduling tailored to pose data, it achieves consistent improvements over state-of-the-art priors across human mesh recovery, pose completion, motion denoising, and pose generation. Key contributions include the unconditional diffusion prior, the test-time truncation strategy, and a comprehensive set of experiments plus ablations that validate robustness and generalization. This work enables flexible, optimization-driven pose estimation pipelines with improved realism and diversity, advancing practical 3D human pose understanding from single images and sequences.
Abstract
This work targets to construct a robust human pose prior. However, it remains a persistent challenge due to biomechanical constraints and diverse human movements. Traditional priors like VAEs and NDFs often exhibit shortcomings in realism and generalization, notably with unseen noisy poses. To address these issues, we introduce DPoser, a robust and versatile human pose prior built upon diffusion models. DPoser regards various pose-centric tasks as inverse problems and employs variational diffusion sampling for efficient solving. Accordingly, designed with optimization frameworks, DPoser seamlessly benefits human mesh recovery, pose generation, pose completion, and motion denoising tasks. Furthermore, due to the disparity between the articulated poses and structured images, we propose truncated timestep scheduling to enhance the effectiveness of DPoser. Our approach demonstrates considerable enhancements over common uniform scheduling used in image domains, boasting improvements of 5.4%, 17.2%, and 3.8% across human mesh recovery, pose completion, and motion denoising, respectively. Comprehensive experiments demonstrate the superiority of DPoser over existing state-of-the-art pose priors across multiple tasks.
