GPS-Gaussian: Generalizable Pixel-wise 3D Gaussian Splatting for Real-time Human Novel View Synthesis
Shunyuan Zheng, Boyao Zhou, Ruizhi Shao, Boning Liu, Shengping Zhang, Liqiang Nie, Yebin Liu
TL;DR
This work tackles real-time, high-fidelity novel view synthesis of human performers under sparse-view cameras. It introduces GPS-Gaussian, a generalizable pixel-wise 3D Gaussian Splatting framework that regresses Gaussian parameter maps on source views and unprojects them to 3D via jointly trained depth estimation, all in a differentiable rendering loop. The approach achieves $2K$-resolution rendering at over 25 FPS without fine-tuning and outperforms state-of-the-art methods (ENeRF, FloRen, 3D-GS) on synthetic and real data, while maintaining robust performance under view sparsity and occlusions. By leveraging large-scale human priors and a two-view depth-guided Gaussian representation, GPS-Gaussian enables instant, interactive, and scalable human NVS suitable for applications like holographic displays and immersive media.
Abstract
We present a new approach, termed GPS-Gaussian, for synthesizing novel views of a character in a real-time manner. The proposed method enables 2K-resolution rendering under a sparse-view camera setting. Unlike the original Gaussian Splatting or neural implicit rendering methods that necessitate per-subject optimizations, we introduce Gaussian parameter maps defined on the source views and regress directly Gaussian Splatting properties for instant novel view synthesis without any fine-tuning or optimization. To this end, we train our Gaussian parameter regression module on a large amount of human scan data, jointly with a depth estimation module to lift 2D parameter maps to 3D space. The proposed framework is fully differentiable and experiments on several datasets demonstrate that our method outperforms state-of-the-art methods while achieving an exceeding rendering speed.
