Table of Contents
Fetching ...

SplatSuRe: Selective Super-Resolution for Multi-view Consistent 3D Gaussian Splatting

Pranav Asthana, Alex Hanson, Allen Tu, Tom Goldstein, Matthias Zwicker, Amitabh Varshney

TL;DR

SplatSuRe tackles the problem of generating high-resolution, multi-view-consistent renders from low-resolution inputs in 3D Gaussian Splatting. It introduces a geometry-aware, selective SR framework that computes a per-Gaussian fidelity score and per-view weight maps to constrain SR supervision to undersampled regions, thereby avoiding view-inconsistent artifacts. The method couples LR supervision with selectively weighted SR losses and demonstrates state-of-the-art performance across major datasets, with notable gains in foreground detail and cross-view consistency. This selective approach enables sharper, more realistic renders without compromising the underlying 3DGS pipeline or requiring additional neural components.

Abstract

3D Gaussian Splatting (3DGS) enables high-quality novel view synthesis, motivating interest in generating higher-resolution renders than those available during training. A natural strategy is to apply super-resolution (SR) to low-resolution (LR) input views, but independently enhancing each image introduces multi-view inconsistencies, leading to blurry renders. Prior methods attempt to mitigate these inconsistencies through learned neural components, temporally consistent video priors, or joint optimization on LR and SR views, but all uniformly apply SR across every image. In contrast, our key insight is that close-up LR views may contain high-frequency information for regions also captured in more distant views, and that we can use the camera pose relative to scene geometry to inform where to add SR content. Building from this insight, we propose SplatSuRe, a method that selectively applies SR content only in undersampled regions lacking high-frequency supervision, yielding sharper and more consistent results. Across Tanks & Temples, Deep Blending and Mip-NeRF 360, our approach surpasses baselines in both fidelity and perceptual quality. Notably, our gains are most significant in localized foreground regions where higher detail is desired.

SplatSuRe: Selective Super-Resolution for Multi-view Consistent 3D Gaussian Splatting

TL;DR

SplatSuRe tackles the problem of generating high-resolution, multi-view-consistent renders from low-resolution inputs in 3D Gaussian Splatting. It introduces a geometry-aware, selective SR framework that computes a per-Gaussian fidelity score and per-view weight maps to constrain SR supervision to undersampled regions, thereby avoiding view-inconsistent artifacts. The method couples LR supervision with selectively weighted SR losses and demonstrates state-of-the-art performance across major datasets, with notable gains in foreground detail and cross-view consistency. This selective approach enables sharper, more realistic renders without compromising the underlying 3DGS pipeline or requiring additional neural components.

Abstract

3D Gaussian Splatting (3DGS) enables high-quality novel view synthesis, motivating interest in generating higher-resolution renders than those available during training. A natural strategy is to apply super-resolution (SR) to low-resolution (LR) input views, but independently enhancing each image introduces multi-view inconsistencies, leading to blurry renders. Prior methods attempt to mitigate these inconsistencies through learned neural components, temporally consistent video priors, or joint optimization on LR and SR views, but all uniformly apply SR across every image. In contrast, our key insight is that close-up LR views may contain high-frequency information for regions also captured in more distant views, and that we can use the camera pose relative to scene geometry to inform where to add SR content. Building from this insight, we propose SplatSuRe, a method that selectively applies SR content only in undersampled regions lacking high-frequency supervision, yielding sharper and more consistent results. Across Tanks & Temples, Deep Blending and Mip-NeRF 360, our approach surpasses baselines in both fidelity and perceptual quality. Notably, our gains are most significant in localized foreground regions where higher detail is desired.

Paper Structure

This paper contains 26 sections, 12 equations, 9 figures, 13 tables.

Figures (9)

  • Figure 1: Overview of our SplatSuRe framework. A high-resolution (HR) 3D Gaussian Splatting (3DGS) model is trained using low-resolution (LR) and super-resolution (SR) inputs. We first train a 3DGS model on LR inputs to identify undersampled regions and render per-view weight maps that indicate where SR is needed. During training of the HR 3DGS model, the images produced by the frozen single-image super-resolution (SISR) model are spatially weighted by these maps to form the SR loss $\mathcal{L}_{SR}$. A complementary LR loss $\mathcal{L}_{LR}$ compares the downsampled HR render against the original LR ground truth to provide consistent supervision across the entire image.
  • Figure 2: Disparity in high-frequency ground truth information across different views. Low-resolution ground truth from near cameras provides high-resolution information for rendering distant views, reducing the need for additional generated detail in those views. Conversely, super-resolution is needed in views where no other camera provides higher-resolution information.
  • Figure 3: Super-resolution weight maps. Bright regions indicate areas where generative detail is required, while dark regions correspond to areas well-sampled by other low-resolution views. Note that high weights are obtained in regions that are either not sampled closely, such as background trees behind the tractor, or where other views do not provide higher resolution information, such as the foreground table in the ballroom.
  • Figure 4: Qualitative results on Tanks & Temples Knapitsch2017tandt, Deep Blending DeepBlending2018 and Mip-Nerf 360 barron2022mipnerf360. Experiments are performed at $4\times$ super-resolution with ratio threshold $\tau{=}1.1$. Compared to Mip-Splatting Yu2024MipSplatting and SRGS feng2024srgssuperresolution3dgaussian, our method produces sharper, more faithful reconstructions that better align with ground truth while maintaining cross-view consistency. It preserves finer details in text ( red box on truck), high-frequency patterns ( yellow box on carpet and green box on tray) and recognizable distant objects observed in other views ( blue box on church mural). Notably, it reduces Gaussian artifacts ( orange arrow ) observed in other methods. Additional results in Appendix \ref{['appendix:additional_results']}.
  • Figure 5: Effect of ratio threshold on Tanks & Temples Knapitsch2017tandt. Weight maps, where bright regions indicate higher SR influence, are shown below the corresponding ratio thresholds. $\tau{=}0$ and $\tau{=}\infty$ correspond to zero and full use of super-resolution. SR is initially helpful in improving rendering quality, but excessive use worsens results. The effect of ratio threshold on different scenes is analyzed in Appendix \ref{['appendix:per_scene_analysis']}.
  • ...and 4 more figures