Table of Contents
Fetching ...

EVA-Gaussian: 3D Gaussian-based Real-time Human Novel View Synthesis under Diverse Multi-view Camera Settings

Yingdong Hu, Zhening Liu, Jiawei Shao, Zehong Lin, Jun Zhang

TL;DR

EVA-Gaussian tackles real-time, high-resolution 3D human novel-view synthesis under diverse sparse-view camera configurations. It introduces Efficient cross-View Attention (EVA) within a three-stage Gaussian-based pipeline—Gaussian Position Estimation, Gaussian Attribute Estimation, and Feature Refinement—augmented by an anchor loss to enforce cross-view consistency. Across THuman2.0 and THumanSit, EVA-Gaussian achieves state-of-the-art rendering quality with robust generalization and real-time inference, even as the number of views or view-angle differences increases. This work enables practical free-viewpoint rendering for AR/VR and holographic applications without relying on template priors, while highlighting memory considerations for high-view-count scenarios and potential improvements with RGB-D cues.

Abstract

Feed-forward based 3D Gaussian Splatting methods have demonstrated exceptional capability in real-time novel view synthesis for human models. However, current approaches are confined to either dense viewpoint configurations or restricted image resolutions. These limitations hinder their flexibility in free-viewpoint rendering across a wide range of camera view angle discrepancies, and also restrict their ability to recover fine-grained human details in real time using commonly available GPUs. To address these challenges, we propose a novel pipeline named EVA-Gaussian for 3D human novel view synthesis across diverse multi-view camera settings. Specifically, we first design an Efficient Cross-View Attention (EVA) module to effectively fuse cross-view information under high resolution inputs and sparse view settings, while minimizing temporal and computational overhead. Additionally, we introduce a feature refinement mechianism to predict the attributes of the 3D Gaussians and assign a feature value to each Gaussian, enabling the correction of artifacts caused by geometric inaccuracies in position estimation and enhancing overall visual fidelity. Experimental results on the THuman2.0 and THumansit datasets showcase the superiority of EVA-Gaussian in rendering quality across diverse camera settings. Project page: https://zhenliuzju.github.io/huyingdong/EVA-Gaussian.

EVA-Gaussian: 3D Gaussian-based Real-time Human Novel View Synthesis under Diverse Multi-view Camera Settings

TL;DR

EVA-Gaussian tackles real-time, high-resolution 3D human novel-view synthesis under diverse sparse-view camera configurations. It introduces Efficient cross-View Attention (EVA) within a three-stage Gaussian-based pipeline—Gaussian Position Estimation, Gaussian Attribute Estimation, and Feature Refinement—augmented by an anchor loss to enforce cross-view consistency. Across THuman2.0 and THumanSit, EVA-Gaussian achieves state-of-the-art rendering quality with robust generalization and real-time inference, even as the number of views or view-angle differences increases. This work enables practical free-viewpoint rendering for AR/VR and holographic applications without relying on template priors, while highlighting memory considerations for high-view-count scenarios and potential improvements with RGB-D cues.

Abstract

Feed-forward based 3D Gaussian Splatting methods have demonstrated exceptional capability in real-time novel view synthesis for human models. However, current approaches are confined to either dense viewpoint configurations or restricted image resolutions. These limitations hinder their flexibility in free-viewpoint rendering across a wide range of camera view angle discrepancies, and also restrict their ability to recover fine-grained human details in real time using commonly available GPUs. To address these challenges, we propose a novel pipeline named EVA-Gaussian for 3D human novel view synthesis across diverse multi-view camera settings. Specifically, we first design an Efficient Cross-View Attention (EVA) module to effectively fuse cross-view information under high resolution inputs and sparse view settings, while minimizing temporal and computational overhead. Additionally, we introduce a feature refinement mechianism to predict the attributes of the 3D Gaussians and assign a feature value to each Gaussian, enabling the correction of artifacts caused by geometric inaccuracies in position estimation and enhancing overall visual fidelity. Experimental results on the THuman2.0 and THumansit datasets showcase the superiority of EVA-Gaussian in rendering quality across diverse camera settings. Project page: https://zhenliuzju.github.io/huyingdong/EVA-Gaussian.
Paper Structure (15 sections, 11 equations, 6 figures, 6 tables)

This paper contains 15 sections, 11 equations, 6 figures, 6 tables.

Figures (6)

  • Figure 1: Qualitative comparison of novel view synthesis on the THuman2.0 dataset, with the angle between the stereo views being 72 degree and GT representing the ground truth. We compare our proposed EVA-Gaussian against the state-of-the-art approaches GPS-Gaussian gpsgs and ENeRF enerf. The quantitative metrics of PSNR$\uparrow$, SSIM$\uparrow$, LPIPS$\downarrow$, and inference time$\downarrow$ demonstrate that EVA-Gaussian achieves superior reconstruction quality, while enabling real-time reconstruction under sparse-view conditions and high-resolution settings.
  • Figure 2: Framework of EVA-Gaussian. EVA-Gaussian takes sparse-view images captured around a human subject as input and performs three key stages: (1) estimating the positions of 3D Gaussians, (2) inferring the remaining attributes (i.e., opacities, scales, quaternions, and features) of these Gaussians, and (3) refining the output image in a recurrent manner.
  • Figure 3: Efficient cross-View Attention (EVA) module for 3D Gaussian position estimation. EVA takes multi-view image features as input, embeds them into window patches using a shifted algorithm, and performs cross attention between the features from different views.
  • Figure 4: Qualitative comparison on THuman2.0 and THumansit. EVA-Gaussian achieves superior novel view rendering quality under diverse camera settings. Additional visualization results are provided in Appendix C.
  • Figure 5: Visualization of cross-domain evaluation results for EVA-Gaussian. The left side displays the rendered results generated by EVA-Gaussian trained on the THuman2.0 dataset and evaluated on the THumansit dataset, while the right side shows the rendered results from EVA-Gaussian trained on the THumansit dataset and evaluated on the THuman2.0 dataset.
  • ...and 1 more figures