Table of Contents
Fetching ...

3D Gaussian Splatting for Real-Time Radiance Field Rendering

Bernhard Kerbl, Georgios Kopanas, Thomas Leimkühler, George Drettakis

TL;DR

<3-5 sentence high-level summary> The paper tackles the challenge of real-time, high-quality novel-view synthesis for unbounded scenes by introducing a differentiable 3D Gaussian spline representation with anisotropic covariances, initialized from SfM points, and optimized with adaptive density control. A fast, tile-based differentiable rasterizer enables visibility-aware anisotropic splatting that preserves depth order and supports backpropagation across many splats, delivering 1080p rendering at real-time framerates with competitive training times. The approach achieves state-of-the-art-like visual quality on real and synthetic datasets, with substantially faster training than prior SOTA NeRF methods and real-time rendering, making radiance-field rendering practical for interactive use. Limitations include artifacts in poorly observed regions and substantial memory usage during training, but the method offers a promising direction toward real-time, explicit 3D representations for neural rendering.

Abstract

Radiance Field methods have recently revolutionized novel-view synthesis of scenes captured with multiple photos or videos. However, achieving high visual quality still requires neural networks that are costly to train and render, while recent faster methods inevitably trade off speed for quality. For unbounded and complete scenes (rather than isolated objects) and 1080p resolution rendering, no current method can achieve real-time display rates. We introduce three key elements that allow us to achieve state-of-the-art visual quality while maintaining competitive training times and importantly allow high-quality real-time (>= 30 fps) novel-view synthesis at 1080p resolution. First, starting from sparse points produced during camera calibration, we represent the scene with 3D Gaussians that preserve desirable properties of continuous volumetric radiance fields for scene optimization while avoiding unnecessary computation in empty space; Second, we perform interleaved optimization/density control of the 3D Gaussians, notably optimizing anisotropic covariance to achieve an accurate representation of the scene; Third, we develop a fast visibility-aware rendering algorithm that supports anisotropic splatting and both accelerates training and allows realtime rendering. We demonstrate state-of-the-art visual quality and real-time rendering on several established datasets.

3D Gaussian Splatting for Real-Time Radiance Field Rendering

TL;DR

<3-5 sentence high-level summary> The paper tackles the challenge of real-time, high-quality novel-view synthesis for unbounded scenes by introducing a differentiable 3D Gaussian spline representation with anisotropic covariances, initialized from SfM points, and optimized with adaptive density control. A fast, tile-based differentiable rasterizer enables visibility-aware anisotropic splatting that preserves depth order and supports backpropagation across many splats, delivering 1080p rendering at real-time framerates with competitive training times. The approach achieves state-of-the-art-like visual quality on real and synthetic datasets, with substantially faster training than prior SOTA NeRF methods and real-time rendering, making radiance-field rendering practical for interactive use. Limitations include artifacts in poorly observed regions and substantial memory usage during training, but the method offers a promising direction toward real-time, explicit 3D representations for neural rendering.

Abstract

Radiance Field methods have recently revolutionized novel-view synthesis of scenes captured with multiple photos or videos. However, achieving high visual quality still requires neural networks that are costly to train and render, while recent faster methods inevitably trade off speed for quality. For unbounded and complete scenes (rather than isolated objects) and 1080p resolution rendering, no current method can achieve real-time display rates. We introduce three key elements that allow us to achieve state-of-the-art visual quality while maintaining competitive training times and importantly allow high-quality real-time (>= 30 fps) novel-view synthesis at 1080p resolution. First, starting from sparse points produced during camera calibration, we represent the scene with 3D Gaussians that preserve desirable properties of continuous volumetric radiance fields for scene optimization while avoiding unnecessary computation in empty space; Second, we perform interleaved optimization/density control of the 3D Gaussians, notably optimizing anisotropic covariance to achieve an accurate representation of the scene; Third, we develop a fast visibility-aware rendering algorithm that supports anisotropic splatting and both accelerates training and allows realtime rendering. We demonstrate state-of-the-art visual quality and real-time rendering on several established datasets.
Paper Structure (33 sections, 12 equations, 11 figures, 9 tables, 2 algorithms)

This paper contains 33 sections, 12 equations, 11 figures, 9 tables, 2 algorithms.

Figures (11)

  • Figure 1: Optimization starts with the sparse SfM point cloud and creates a set of 3D Gaussians. We then optimize and adaptively control the density of this set of Gaussians. During optimization we use our fast tile-based renderer, allowing competitive training times compared to SOTA fast radiance field methods. Once trained, our renderer allows real-time navigation for a wide variety of scenes.
  • Figure 2: We visualize the 3D Gaussians after optimization by shrinking them 60% (far right). This clearly shows the anisotropic shapes of the 3D Gaussians that compactly represent complex geometry after optimization. Left the actual rendered image.
  • Figure 3: Our adaptive Gaussian densification scheme. Top row (under-reconstruction): When small-scale geometry (black outline) is insufficiently covered, we clone the respective Gaussian. Bottom row (over-reconstruction): If small-scale geometry is represented by one large splat, we split it in two.
  • Figure 4: We show comparisons of ours to previous methods and the corresponding ground truth images from held-out test views. The scenes are, from the top down: Bicycle, Garden, Stump, Counter and Room from the Mip-NeRF360 dataset; Playroom, DrJohnson from the Deep Blending dataset hedman2018deep and Truck and Train from Tanks&Temples. Non-obvious differences in quality highlighted by arrows/insets.
  • Figure 5: For some scenes (above) we can see that even at 7K iterations ($\sim$5min for this scene), our method has captured the train quite well. At 30K iterations ($\sim$35min) the background artifacts have been reduced significantly. For other scenes (below), the difference is barely visible; 7K iterations ($\sim$8min) is already very high quality.
  • ...and 6 more figures