Table of Contents
Fetching ...

PEP-GS: Perceptually-Enhanced Precise Structured 3D Gaussians for View-Adaptive Rendering

Junxi Jin, Xiulai Li, Haiping Huang, Lianjun Liu, Yujie Sun, Logan Liu

TL;DR

PEP-GS advances real-time 3D Gaussian Splatting by addressing view-dependent rendering and texture fidelity through a perceptually guided pipeline. It replaces traditional color encodings with Hierarchical Granular-Structural Attention and leverages Kolmogorov-Arnold Networks to robustly predict Gaussian opacity, rotation, and covariance, coupled with a multi-scale NLPD perceptual loss. The framework builds on a sparse SfM-derived anchor grid to maintain efficiency while refining Gaussians for local detail and global consistency. Experimental results on Mip-NeRF360, Tanks&Temples, and DeepBlending demonstrate improved perceptual quality and stability across challenging lighting and fine-scale structures, with ablations confirming the critical roles of HGSA, KAN, and NLPD. While achieving strong rendering quality, the method presents a small efficiency trade-off relative to Scaffold-GS, pointing to future work on speed optimizations.

Abstract

Recently, 3D Gaussian Splatting (3D-GS) has achieved significant success in real-time, high-quality 3D scene rendering. However, it faces several challenges, including Gaussian redundancy, limited ability to capture view-dependent effects, and difficulties in handling complex lighting and specular reflections. Additionally, methods that use spherical harmonics for color representation often struggle to effectively capture anisotropic components, especially when modeling view-dependent colors under complex lighting conditions, leading to insufficient contrast and unnatural color saturation. To address these limitations, we introduce PEP-GS, a perceptually-enhanced framework that dynamically predicts Gaussian attributes, including opacity, color, and covariance. We replace traditional spherical harmonics with a Hierarchical Granular-Structural Attention mechanism, which enables more accurate modeling of complex view-dependent color effects. By employing a stable and interpretable framework for opacity and covariance estimation, PEP-GS avoids the removal of essential Gaussians prematurely, ensuring a more accurate scene representation. Furthermore, perceptual optimization is applied to the final rendered images, enhancing perceptual consistency across different views and ensuring high-quality renderings with improved texture fidelity and fine-scale detail preservation. Experimental results demonstrate that PEP-GS outperforms state-of-the-art methods, particularly in challenging scenarios involving view-dependent effects and fine-scale details.

PEP-GS: Perceptually-Enhanced Precise Structured 3D Gaussians for View-Adaptive Rendering

TL;DR

PEP-GS advances real-time 3D Gaussian Splatting by addressing view-dependent rendering and texture fidelity through a perceptually guided pipeline. It replaces traditional color encodings with Hierarchical Granular-Structural Attention and leverages Kolmogorov-Arnold Networks to robustly predict Gaussian opacity, rotation, and covariance, coupled with a multi-scale NLPD perceptual loss. The framework builds on a sparse SfM-derived anchor grid to maintain efficiency while refining Gaussians for local detail and global consistency. Experimental results on Mip-NeRF360, Tanks&Temples, and DeepBlending demonstrate improved perceptual quality and stability across challenging lighting and fine-scale structures, with ablations confirming the critical roles of HGSA, KAN, and NLPD. While achieving strong rendering quality, the method presents a small efficiency trade-off relative to Scaffold-GS, pointing to future work on speed optimizations.

Abstract

Recently, 3D Gaussian Splatting (3D-GS) has achieved significant success in real-time, high-quality 3D scene rendering. However, it faces several challenges, including Gaussian redundancy, limited ability to capture view-dependent effects, and difficulties in handling complex lighting and specular reflections. Additionally, methods that use spherical harmonics for color representation often struggle to effectively capture anisotropic components, especially when modeling view-dependent colors under complex lighting conditions, leading to insufficient contrast and unnatural color saturation. To address these limitations, we introduce PEP-GS, a perceptually-enhanced framework that dynamically predicts Gaussian attributes, including opacity, color, and covariance. We replace traditional spherical harmonics with a Hierarchical Granular-Structural Attention mechanism, which enables more accurate modeling of complex view-dependent color effects. By employing a stable and interpretable framework for opacity and covariance estimation, PEP-GS avoids the removal of essential Gaussians prematurely, ensuring a more accurate scene representation. Furthermore, perceptual optimization is applied to the final rendered images, enhancing perceptual consistency across different views and ensuring high-quality renderings with improved texture fidelity and fine-scale detail preservation. Experimental results demonstrate that PEP-GS outperforms state-of-the-art methods, particularly in challenging scenarios involving view-dependent effects and fine-scale details.

Paper Structure

This paper contains 20 sections, 17 equations, 7 figures, 4 tables.

Figures (7)

  • Figure 1: Comparison of PEP-GS against SOTA methods. Our method achieves better perceptual consistency and more accurate view-dependent effects across different viewing angles, particularly in scenes with complex lighting and textures
  • Figure 2: Overview of PEP-GS
  • Figure 3: The PSNR curve of Scaffold-GS and PEP-GS across the truck scene in the Tanks & Temples dataset under training and test views.
  • Figure 4: Compared to existing baselines, our method demonstrates superior detail preservation, reduced artifacts, and improved color consistency. Notably, in the second row of the images, our method also shows certain improvements in handling specular reflections compared to other approaches.
  • Figure 5: Ablation studies. The top images show our rendering results, and the bottom images present the error maps computed between these renderings and the ground truth (GT). In these maps, more intense colors denote larger discrepancies from the GT. Similar to Table. \ref{['tab:Mipnerf360 datasets']}, KOp and KCov represent two methods that utilize the KAN framework to predict Gaussian opacity and covariance attributes, respectively.
  • ...and 2 more figures