Table of Contents
Fetching ...

Drop-In Perceptual Optimization for 3D Gaussian Splatting

Ezgi Ozyilkan, Zhiqi Chen, Oren Rippel, Jona Ballé, Kedar Tatwawadi

Abstract

Despite their output being ultimately consumed by human viewers, 3D Gaussian Splatting (3DGS) methods often rely on ad-hoc combinations of pixel-level losses, resulting in blurry renderings. To address this, we systematically explore perceptual optimization strategies for 3DGS by searching over a diverse set of distortion losses. We conduct the first-of-its-kind large-scale human subjective study on 3DGS, involving 39,320 pairwise ratings across several datasets and 3DGS frameworks. A regularized version of Wasserstein Distortion, which we call WD-R, emerges as the clear winner, excelling at recovering fine textures without incurring a higher splat count. WD-R is preferred by raters more than $2.3\times$ over the original 3DGS loss, and $1.5\times$ over current best method Perceptual-GS. WD-R also consistently achieves state-of-the-art LPIPS, DISTS, and FID scores across various datasets, and generalizes across recent frameworks, such as Mip-Splatting and Scaffold-GS, where replacing the original loss with WD-R consistently enhances perceptual quality within a similar resource budget (number of splats for Mip-Splatting, model size for Scaffold-GS), and leads to reconstructions being preferred by human raters $1.8\times$ and $3.6\times$, respectively. We also find that this carries over to the task of 3DGS scene compression, with $\approx 50\%$ bitrate savings for comparable perceptual metric performance.

Drop-In Perceptual Optimization for 3D Gaussian Splatting

Abstract

Despite their output being ultimately consumed by human viewers, 3D Gaussian Splatting (3DGS) methods often rely on ad-hoc combinations of pixel-level losses, resulting in blurry renderings. To address this, we systematically explore perceptual optimization strategies for 3DGS by searching over a diverse set of distortion losses. We conduct the first-of-its-kind large-scale human subjective study on 3DGS, involving 39,320 pairwise ratings across several datasets and 3DGS frameworks. A regularized version of Wasserstein Distortion, which we call WD-R, emerges as the clear winner, excelling at recovering fine textures without incurring a higher splat count. WD-R is preferred by raters more than over the original 3DGS loss, and over current best method Perceptual-GS. WD-R also consistently achieves state-of-the-art LPIPS, DISTS, and FID scores across various datasets, and generalizes across recent frameworks, such as Mip-Splatting and Scaffold-GS, where replacing the original loss with WD-R consistently enhances perceptual quality within a similar resource budget (number of splats for Mip-Splatting, model size for Scaffold-GS), and leads to reconstructions being preferred by human raters and , respectively. We also find that this carries over to the task of 3DGS scene compression, with bitrate savings for comparable perceptual metric performance.
Paper Structure (42 sections, 6 equations, 18 figures, 10 tables)

This paper contains 42 sections, 6 equations, 18 figures, 10 tables.

Figures (18)

  • Figure 1: Novel view rendering using 3DGS on the large-scale Barcelona scene from the BungeeNeRF dataset xiangli2022bungeenerf. We compare the original 3DGS distortion loss $\mathrm{L1}+\mathrm{SSIM}$kerbl3Dgaussians, Pixel-GS zhang2024pixel, and the state-of-the-art in splat-efficient perceptual quality Perceptual-GS zhou2025perceptual, to the best-performing perceptual losses from our studies---Wasserstein Distortion WD_orig (WD) and a variant weighted with the original 3DGS loss, which we denote as WD-Regularized (WD-R). $\#\mathrm{G}$ denotes the splat count for each method, and green bars show the ratio of human raters preferring rendered image patches from the respective loss for this scene ( for every rater preferring the original loss, there are 2.4 and 3.2 raters preferring WD and WD-R, respectively).
  • Figure 2: 3DGS representation and compression frameworks optimized using \ref{['eq:2D_distortion']} and \ref{['eq:RD']}, respectively, incorporating the perceptual losses discussed in \ref{['subsec:perceptual_losses']}.
  • Figure 3: 1D sketch of visual textures with large pointwise difference but small Wasserstein distortion (WD) (see \ref{['eq:WD']}). Both original and reconstructed textures look the same at first glance, and share nearly the same local mean and standard deviation. However, they are very different in terms of pointwise differences. In contrast to all other metrics we compare here, WD evaluates differences in terms of local statistics ( the first and second moments, $\bmu$ and $\bnu$).
  • Figure 4: Bayesian Elo scores for 3DGS representation methods across indoor scenes (Deep Blending DeepBlending2018, Mip-NeRF 360 indoor barron2022mipnerf360), outdoor scenes (Tanks & Temples Knapitsch2017, Mip-NeRF 360 outdoor barron2022mipnerf360 and BungeeNeRF xiangli2022bungeenerf), and all scenes combined. WD-R and WD achieve the highest scores in all settings (within the 95% confidence interval).
  • Figure 5: Visual comparison of the novel view synthesis results obtained by the original 3DGS kerbl3Dgaussians, Pixel-GS zhang2024pixel, Perceptual-GS zhou2025perceptual, and the perceptual loss families discussed in \ref{['subsec:perceptual_losses']}. The left images show the full scenes, with detailed crops highlighting reconstruction differences across methods, where $\#\mathrm{G}$ indicates the number of Gaussian splats for each method.
  • ...and 13 more figures