DPMesh: Exploiting Diffusion Prior for Occluded Human Mesh Recovery
Yixuan Zhu, Ao Li, Yansong Tang, Wenliang Zhao, Jie Zhou, Jiwen Lu
TL;DR
Occluded 3D human mesh recovery is challenging due to weak image features under occlusion. This paper introduces DPMesh, which leverages a pre-trained diffusion model as a one-step image backbone with conditional control from 2D cues and a Noisy Key-point Reasoning module to exploit diffusion priors for occluded pose estimation. The method regresses SMPL parameters through a VQVAE-based pose representation guided by cross-attention maps and diffusion priors, without iterative denoising. Across occlusion and standard benchmarks, DPMesh achieves state-of-the-art performance, especially in crowded or heavily occluded scenes, highlighting the practical value of diffusion priors for perception tasks.
Abstract
The recovery of occluded human meshes presents challenges for current methods due to the difficulty in extracting effective image features under severe occlusion. In this paper, we introduce DPMesh, an innovative framework for occluded human mesh recovery that capitalizes on the profound diffusion prior about object structure and spatial relationships embedded in a pre-trained text-to-image diffusion model. Unlike previous methods reliant on conventional backbones for vanilla feature extraction, DPMesh seamlessly integrates the pre-trained denoising U-Net with potent knowledge as its image backbone and performs a single-step inference to provide occlusion-aware information. To enhance the perception capability for occluded poses, DPMesh incorporates well-designed guidance via condition injection, which produces effective controls from 2D observations for the denoising U-Net. Furthermore, we explore a dedicated noisy key-point reasoning approach to mitigate disturbances arising from occlusion and crowded scenarios. This strategy fully unleashes the perceptual capability of the diffusion prior, thereby enhancing accuracy. Extensive experiments affirm the efficacy of our framework, as we outperform state-of-the-art methods on both occlusion-specific and standard datasets. The persuasive results underscore its ability to achieve precise and robust 3D human mesh recovery, particularly in challenging scenarios involving occlusion and crowded scenes.
