Table of Contents
Fetching ...

Evaluating Human Perception of Novel View Synthesis: Subjective Quality Assessment of Gaussian Splatting and NeRF in Dynamic Scenes

Yuhang Zhang, Joshua Maraval, Zhengyu Zhang, Nicolas Ramin, Shishun Tian, Lu Zhang

TL;DR

This work tackles the challenge of human-centered quality assessment for novel view synthesis in dynamic scenes by benchmarking Gaussian Splatting (GS) and Neural Radiance Fields (NeRF). It introduces a comprehensive NVSQA benchmark with 13 real-world Source Sequences (SRCs), covering 360°, front-facing, and single-view PVS, and employs two subjective experiments to analyze viewing-path effects. A broad objective-metric evaluation reveals that current IQA/VQA metrics struggle to faithfully predict perceived quality, with GS-based methods generally outperforming NeRF-based approaches and STGFS leading in dynamic scenarios. By providing dynamic real-world data and a cross-method evaluation framework, the dataset establishes a foundation for future metric development and improvements in NVS methods.

Abstract

Gaussian Splatting (GS) and Neural Radiance Fields (NeRF) are two groundbreaking technologies that have revolutionized the field of Novel View Synthesis (NVS), enabling immersive photorealistic rendering and user experiences by synthesizing multiple viewpoints from a set of images of sparse views. The potential applications of NVS, such as high-quality virtual and augmented reality, detailed 3D modeling, and realistic medical organ imaging, underscore the importance of quality assessment of NVS methods from the perspective of human perception. Although some previous studies have explored subjective quality assessments for NVS technology, they still face several challenges, especially in NVS methods selection, scenario coverage, and evaluation methodology. To address these challenges, we conducted two subjective experiments for the quality assessment of NVS technologies containing both GS-based and NeRF-based methods, focusing on dynamic and real-world scenes. This study covers 360°, front-facing, and single-viewpoint videos while providing a richer and greater number of real scenes. Meanwhile, it's the first time to explore the impact of NVS methods in dynamic scenes with moving objects. The two types of subjective experiments help to fully comprehend the influences of different viewing paths from a human perception perspective and pave the way for future development of full-reference and no-reference quality metrics. In addition, we established a comprehensive benchmark of various state-of-the-art objective metrics on the proposed database, highlighting that existing methods still struggle to accurately capture subjective quality. The results give us some insights into the limitations of existing NVS methods and may promote the development of new NVS methods.

Evaluating Human Perception of Novel View Synthesis: Subjective Quality Assessment of Gaussian Splatting and NeRF in Dynamic Scenes

TL;DR

This work tackles the challenge of human-centered quality assessment for novel view synthesis in dynamic scenes by benchmarking Gaussian Splatting (GS) and Neural Radiance Fields (NeRF). It introduces a comprehensive NVSQA benchmark with 13 real-world Source Sequences (SRCs), covering 360°, front-facing, and single-view PVS, and employs two subjective experiments to analyze viewing-path effects. A broad objective-metric evaluation reveals that current IQA/VQA metrics struggle to faithfully predict perceived quality, with GS-based methods generally outperforming NeRF-based approaches and STGFS leading in dynamic scenarios. By providing dynamic real-world data and a cross-method evaluation framework, the dataset establishes a foundation for future metric development and improvements in NVS methods.

Abstract

Gaussian Splatting (GS) and Neural Radiance Fields (NeRF) are two groundbreaking technologies that have revolutionized the field of Novel View Synthesis (NVS), enabling immersive photorealistic rendering and user experiences by synthesizing multiple viewpoints from a set of images of sparse views. The potential applications of NVS, such as high-quality virtual and augmented reality, detailed 3D modeling, and realistic medical organ imaging, underscore the importance of quality assessment of NVS methods from the perspective of human perception. Although some previous studies have explored subjective quality assessments for NVS technology, they still face several challenges, especially in NVS methods selection, scenario coverage, and evaluation methodology. To address these challenges, we conducted two subjective experiments for the quality assessment of NVS technologies containing both GS-based and NeRF-based methods, focusing on dynamic and real-world scenes. This study covers 360°, front-facing, and single-viewpoint videos while providing a richer and greater number of real scenes. Meanwhile, it's the first time to explore the impact of NVS methods in dynamic scenes with moving objects. The two types of subjective experiments help to fully comprehend the influences of different viewing paths from a human perception perspective and pave the way for future development of full-reference and no-reference quality metrics. In addition, we established a comprehensive benchmark of various state-of-the-art objective metrics on the proposed database, highlighting that existing methods still struggle to accurately capture subjective quality. The results give us some insights into the limitations of existing NVS methods and may promote the development of new NVS methods.
Paper Structure (24 sections, 14 figures, 4 tables)

This paper contains 24 sections, 14 figures, 4 tables.

Figures (14)

  • Figure 1: Subjective evaluation of Radiance Field methods illustration. Multiple multi view video sequences are selected as sources. Pre-rendered Video sequences (PVS) are prepared by reconstructing the sequences with an ensemble of Radiance Field methods to be evaluated. Following a selected evaluation methodology, a quality score is obtained from the observers' evaluation of the sequences.
  • Figure 2: Each source sequence consists of a rig of camera either 360° (depicted in (a)), front-facing (depicted in (b)), or static single viewpoint (depicted in (c)). Yellow highlights the types of virtual camera for Pre-Rendered Video Sequences: 360° virtual camera transition (a), front-facing virtual camera transition (b), or static single viewpoint virtual camera (c).
  • Figure 3: 13 real-world SRCs used in our subjective experiment.
  • Figure 4: SI and TI of SRCs.
  • Figure 5: Colorfulness and GLCM Contrast of SRCs.
  • ...and 9 more figures