SplatSuRe: Selective Super-Resolution for Multi-view Consistent 3D Gaussian Splatting
Pranav Asthana, Alex Hanson, Allen Tu, Tom Goldstein, Matthias Zwicker, Amitabh Varshney
TL;DR
SplatSuRe tackles the problem of generating high-resolution, multi-view-consistent renders from low-resolution inputs in 3D Gaussian Splatting. It introduces a geometry-aware, selective SR framework that computes a per-Gaussian fidelity score and per-view weight maps to constrain SR supervision to undersampled regions, thereby avoiding view-inconsistent artifacts. The method couples LR supervision with selectively weighted SR losses and demonstrates state-of-the-art performance across major datasets, with notable gains in foreground detail and cross-view consistency. This selective approach enables sharper, more realistic renders without compromising the underlying 3DGS pipeline or requiring additional neural components.
Abstract
3D Gaussian Splatting (3DGS) enables high-quality novel view synthesis, motivating interest in generating higher-resolution renders than those available during training. A natural strategy is to apply super-resolution (SR) to low-resolution (LR) input views, but independently enhancing each image introduces multi-view inconsistencies, leading to blurry renders. Prior methods attempt to mitigate these inconsistencies through learned neural components, temporally consistent video priors, or joint optimization on LR and SR views, but all uniformly apply SR across every image. In contrast, our key insight is that close-up LR views may contain high-frequency information for regions also captured in more distant views, and that we can use the camera pose relative to scene geometry to inform where to add SR content. Building from this insight, we propose SplatSuRe, a method that selectively applies SR content only in undersampled regions lacking high-frequency supervision, yielding sharper and more consistent results. Across Tanks & Temples, Deep Blending and Mip-NeRF 360, our approach surpasses baselines in both fidelity and perceptual quality. Notably, our gains are most significant in localized foreground regions where higher detail is desired.
