Table of Contents
Fetching ...

SuperGaussian: Repurposing Video Models for 3D Super Resolution

Yuan Shen, Duygu Ceylan, Paul Guerrero, Zexiang Xu, Niloy J. Mitra, Shenlong Wang, Anna Frühstück

TL;DR

3D content often lags behind image/video fidelity when starting from coarse representations. SuperGaussian repurposes pretrained video upsampling models to perform 3D super-resolution by rendering a multi-view video from a coarse scene, upsampling it with a video prior, and consolidating the result into a 3D Gaussian Splat representation. The method is modular and domain-agnostic, with finetuning on domain data to handle modality-specific artifacts, and demonstrates improved perceptual and geometric fidelity across diverse inputs (e.g., Gaussian Splats, NeRFs, and noisy scans). This approach reduces the need for large-scale 3D datasets and can be integrated into existing workflows, enabling high-quality 3D reconstructions from varied low-resolution sources.

Abstract

We present a simple, modular, and generic method that upsamples coarse 3D models by adding geometric and appearance details. While generative 3D models now exist, they do not yet match the quality of their counterparts in image and video domains. We demonstrate that it is possible to directly repurpose existing (pretrained) video models for 3D super-resolution and thus sidestep the problem of the shortage of large repositories of high-quality 3D training models. We describe how to repurpose video upsampling models, which are not 3D consistent, and combine them with 3D consolidation to produce 3D-consistent results. As output, we produce high quality Gaussian Splat models, which are object centric and effective. Our method is category agnostic and can be easily incorporated into existing 3D workflows. We evaluate our proposed SuperGaussian on a variety of 3D inputs, which are diverse both in terms of complexity and representation (e.g., Gaussian Splats or NeRFs), and demonstrate that our simple method significantly improves the fidelity of the final 3D models. Check our project website for details: supergaussian.github.io

SuperGaussian: Repurposing Video Models for 3D Super Resolution

TL;DR

3D content often lags behind image/video fidelity when starting from coarse representations. SuperGaussian repurposes pretrained video upsampling models to perform 3D super-resolution by rendering a multi-view video from a coarse scene, upsampling it with a video prior, and consolidating the result into a 3D Gaussian Splat representation. The method is modular and domain-agnostic, with finetuning on domain data to handle modality-specific artifacts, and demonstrates improved perceptual and geometric fidelity across diverse inputs (e.g., Gaussian Splats, NeRFs, and noisy scans). This approach reduces the need for large-scale 3D datasets and can be integrated into existing workflows, enabling high-quality 3D reconstructions from varied low-resolution sources.

Abstract

We present a simple, modular, and generic method that upsamples coarse 3D models by adding geometric and appearance details. While generative 3D models now exist, they do not yet match the quality of their counterparts in image and video domains. We demonstrate that it is possible to directly repurpose existing (pretrained) video models for 3D super-resolution and thus sidestep the problem of the shortage of large repositories of high-quality 3D training models. We describe how to repurpose video upsampling models, which are not 3D consistent, and combine them with 3D consolidation to produce 3D-consistent results. As output, we produce high quality Gaussian Splat models, which are object centric and effective. Our method is category agnostic and can be easily incorporated into existing 3D workflows. We evaluate our proposed SuperGaussian on a variety of 3D inputs, which are diverse both in terms of complexity and representation (e.g., Gaussian Splats or NeRFs), and demonstrate that our simple method significantly improves the fidelity of the final 3D models. Check our project website for details: supergaussian.github.io
Paper Structure (14 sections, 1 equation, 12 figures, 3 tables)

This paper contains 14 sections, 1 equation, 12 figures, 3 tables.

Figures (12)

  • Figure 1: We present SuperGaussian, a novel method that repurposes existing video upsampling methods for the 3D superresolution task. SuperGaussian can handle various input types such as NeRFs, Gaussian Splats, reconstructions obtained from noisy scans, models generated by recent text-to-3D methods Li2023Instant3D, or low-poly meshes (e.g., assets used in Sim-on-Wheels shen2023sim). SuperGaussian generates high-resolution 3D outputs with rich geometric and texture details in the form of Gaussian Splats.
  • Figure 2: SuperGaussian pipeline. Given an input low-res 3D representation, which can be in various formats, we first sample a smooth camera trajectory and render an intermediate low-resolution video. We then upsample this video using existing video upsamplers and obtain a higher resolution 3D representation that has sharper and more vivid details. Our method, SuperGaussian, produces a final 3D representation in the form of high-resolution Gaussian Splats.
  • Figure 3: Qualitative comparisons on MVImgNet to upsample low-res Gaussian Splattings. The low-resolution inputs are novel views rendered from fitted low-res Gaussian Splattings along the sampled trajectories. Our method produces the best upsampled 3D scenes with generative texture and geometric level details preserved. Please check our supplementary website to interactively compare results in 3D.
  • Figure 4: Qualitative results of SuperGaussian on Blender synthetic dataset. Besides increased sharpness, SuperGaussian can nicely lift generative details from video upsampling to 3D. The background image is our upsampled 3D NeRF. The blue, yellow, and red boxes indicate a zoom-in view of low-res, our upsampled results, and ground truth. The test pose IDs from left to right are , , , , which are chosen to align with Fig. 5 in NeRF-SR Wang2022NeRFSR for direct comparison.
  • Figure 5: Qualitative evaluation of both video and image upsampling priors for 3D upsampling on Wild RGB-D dataset xia2024rgbd without finetuning. The results visualize a 3D upsampled output from one test scene. Compared with using the image upsampler, adopting the video upsampler can distill more detail to 3D, as can be seen (for example) from the crisper characters of the book.
  • ...and 7 more figures