LieHMR: Autoregressive Human Mesh Recovery with $SO(3)$ Diffusion

Donghwan Kim; Tae-Kyun Kim

LieHMR: Autoregressive Human Mesh Recovery with $SO(3)$ Diffusion

Donghwan Kim, Tae-Kyun Kim

TL;DR

LieHMR tackles monocular human mesh recovery by learning an image-conditioned distribution over $SO(3)$ pose parameters using a diffusion model. The architecture disentangles a time-independent transformer that captures joint relationships from a per-joint, time-dependent denoiser that operates on $SO(3)$, enabling both image-conditioned and unconditional generation. Trained with a hybrid supervised/self-supervised strategy, LieHMR achieves strong single-output performance and diverse multi-output samples, surpassing several probabilistic baselines and competing with state-of-the-art deterministic methods. The approach demonstrates robust generation under occlusion and depth ambiguity, with practical implications for realistic human motion modeling in vision and graphics, while acknowledging the diffusion-based inference cost and potential for further acceleration and multimodal extensions.

Abstract

We tackle the problem of Human Mesh Recovery (HMR) from a single RGB image, formulating it as an image-conditioned human pose and shape generation. While recovering 3D human pose from 2D observations is inherently ambiguous, most existing approaches have regressed a single deterministic output. Probabilistic methods attempt to address this by generating multiple plausible outputs to model the ambiguity. However, these methods often exhibit a trade-off between accuracy and sample diversity, and their single predictions are not competitive with state-of-the-art deterministic models. To overcome these limitations, we propose a novel approach that models well-aligned distribution to 2D observations. In particular, we introduce $SO(3)$ diffusion model, which generates the distribution of pose parameters represented as 3D rotations unconditional and conditional to image observations via conditioning dropout. Our model learns the hierarchical structure of human body joints using the transformer. Instead of using transformer as a denoising model, the time-independent transformer extracts latent vectors for the joints and a small MLP-based denoising model learns the per-joint distribution conditioned on the latent vector. We experimentally demonstrate and analyze that our model predicts accurate pose probability distribution effectively.

LieHMR: Autoregressive Human Mesh Recovery with $SO(3)$ Diffusion

TL;DR

LieHMR tackles monocular human mesh recovery by learning an image-conditioned distribution over

pose parameters using a diffusion model. The architecture disentangles a time-independent transformer that captures joint relationships from a per-joint, time-dependent denoiser that operates on

, enabling both image-conditioned and unconditional generation. Trained with a hybrid supervised/self-supervised strategy, LieHMR achieves strong single-output performance and diverse multi-output samples, surpassing several probabilistic baselines and competing with state-of-the-art deterministic methods. The approach demonstrates robust generation under occlusion and depth ambiguity, with practical implications for realistic human motion modeling in vision and graphics, while acknowledging the diffusion-based inference cost and potential for further acceleration and multimodal extensions.

Abstract

diffusion model, which generates the distribution of pose parameters represented as 3D rotations unconditional and conditional to image observations via conditioning dropout. Our model learns the hierarchical structure of human body joints using the transformer. Instead of using transformer as a denoising model, the time-independent transformer extracts latent vectors for the joints and a small MLP-based denoising model learns the per-joint distribution conditioned on the latent vector. We experimentally demonstrate and analyze that our model predicts accurate pose probability distribution effectively.

LieHMR: Autoregressive Human Mesh Recovery with $SO(3)$ Diffusion

TL;DR

Abstract

LieHMR: Autoregressive Human Mesh Recovery with $SO(3)$ Diffusion

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (13)