Bridging Diffusion Models and 3D Representations: A 3D Consistent Super-Resolution Framework
Yi-Ting Chen, Ting-Hsuan Liao, Pengsheng Guo, Alexander Schwing, Jia-Bin Huang
TL;DR
This work addresses the challenge of recovering high-resolution, geometrically consistent 3D scenes from low-resolution inputs by coupling diffusion-based 2D super-resolution with a 3D Gaussian-splatting representation. The proposed 3DSR framework uses a diffusion-prior to generate HR views, then exploits a 3DGS to enforce cross-view coherence, updating latent representations iteratively to maintain 3D consistency. Evaluations on LLFF and MipNeRF360 show superior perceptual quality and improved 3D consistency (measured by MEt3R and FID) compared with ISR, VSR, and diffusion-based baselines, without fine-tuning diffusion models for video data. The results demonstrate that 3DSR achieves sharper textures, fewer cross-view artifacts, and structurally faithful reconstructions, enabling high-quality 3D super-resolution suitable for realistic novel view synthesis.
Abstract
We propose 3D Super Resolution (3DSR), a novel 3D Gaussian-splatting-based super-resolution framework that leverages off-the-shelf diffusion-based 2D super-resolution models. 3DSR encourages 3D consistency across views via the use of an explicit 3D Gaussian-splatting-based scene representation. This makes the proposed 3DSR different from prior work, such as image upsampling or the use of video super-resolution, which either don't consider 3D consistency or aim to incorporate 3D consistency implicitly. Notably, our method enhances visual quality without additional fine-tuning, ensuring spatial coherence within the reconstructed scene. We evaluate 3DSR on MipNeRF360 and LLFF data, demonstrating that it produces high-resolution results that are visually compelling, while maintaining structural consistency in 3D reconstructions.
