RoGSplat: Learning Robust Generalizable Human Gaussian Splatting from Sparse Multi-View Images

Junjin Xiao; Qing Zhang; Yonewei Nie; Lei Zhu; Wei-Shi Zheng

RoGSplat: Learning Robust Generalizable Human Gaussian Splatting from Sparse Multi-View Images

Junjin Xiao, Qing Zhang, Yonewei Nie, Lei Zhu, Wei-Shi Zheng

TL;DR

RoGSplat tackles robust generalizable human novel-view synthesis from sparse multi-view images without per-subject optimization. It first lifts SMPL vertices to dense image-aligned 3D prior points using a SPD-based fusion of pixel- and voxel-level features, then regresses coarse 3D Gaussians and refines them with a coarse-to-fine pixel-wise Gaussian strategy guided by depth refinements. The training uses a two-stage scheme with geometry- and texture-focused losses, plus a depth refiner, achieving real-time-like inference and strong cross-dataset generalization. Empirically, RoGSplat outperforms state-of-the-art NeRF-based and 3D Gaussian Splatting methods on multiple benchmarks and demonstrates robustness to SMPL misalignment while highlighting areas for improvement in loose clothing and facial detail reconstruction.

Abstract

This paper presents RoGSplat, a novel approach for synthesizing high-fidelity novel views of unseen human from sparse multi-view images, while requiring no cumbersome per-subject optimization. Unlike previous methods that typically struggle with sparse views with few overlappings and are less effective in reconstructing complex human geometry, the proposed method enables robust reconstruction in such challenging conditions. Our key idea is to lift SMPL vertices to dense and reliable 3D prior points representing accurate human body geometry, and then regress human Gaussian parameters based on the points. To account for possible misalignment between SMPL model and images, we propose to predict image-aligned 3D prior points by leveraging both pixel-level features and voxel-level features, from which we regress the coarse Gaussians. To enhance the ability to capture high-frequency details, we further render depth maps from the coarse 3D Gaussians to help regress fine-grained pixel-wise Gaussians. Experiments on several benchmark datasets demonstrate that our method outperforms state-of-the-art methods in novel view synthesis and cross-dataset generalization. Our code is available at https://github.com/iSEE-Laboratory/RoGSplat.

RoGSplat: Learning Robust Generalizable Human Gaussian Splatting from Sparse Multi-View Images

TL;DR

Abstract

RoGSplat: Learning Robust Generalizable Human Gaussian Splatting from Sparse Multi-View Images

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (15)