PoseGaussian: Pose-Driven Novel View Synthesis for Robust 3D Human Reconstruction
Ju Shen, Chen Chen, Tam V. Nguyen, Vijayan K. Asari
TL;DR
PoseGaussian addresses the challenge of robust, real-time novel-view synthesis of dynamic humans by integrating pose as a structural prior and a temporal cue into a Gaussian Splatting pipeline. The method fuses pose heatmaps with image features for depth inference and uses a Temporal Pose Stabilizer to maintain temporal coherence, resulting in state-of-the-art perceptual and structural quality while delivering around 100 FPS. Key contributions include pose-guided depth fusion, temporal pose stabilization, and a pose-conditioned loss that aligns fused features with pose encodings, enabling robust generalization across datasets.
Abstract
We propose PoseGaussian, a pose-guided Gaussian Splatting framework for high-fidelity human novel view synthesis. Human body pose serves a dual purpose in our design: as a structural prior, it is fused with a color encoder to refine depth estimation; as a temporal cue, it is processed by a dedicated pose encoder to enhance temporal consistency across frames. These components are integrated into a fully differentiable, end-to-end trainable pipeline. Unlike prior works that use pose only as a condition or for warping, PoseGaussian embeds pose signals into both geometric and temporal stages to improve robustness and generalization. It is specifically designed to address challenges inherent in dynamic human scenes, such as articulated motion and severe self-occlusion. Notably, our framework achieves real-time rendering at 100 FPS, maintaining the efficiency of standard Gaussian Splatting pipelines. We validate our approach on ZJU-MoCap, THuman2.0, and in-house datasets, demonstrating state-of-the-art performance in perceptual quality and structural accuracy (PSNR 30.86, SSIM 0.979, LPIPS 0.028).
