Table of Contents
Fetching ...

OmniSplat: Taming Feed-Forward 3D Gaussian Splatting for Omnidirectional Images with Editable Capabilities

Suyoung Lee, Jaeyoung Chung, Kihoon Kim, Jaeyoo Huh, Gunhee Lee, Minsoo Lee, Kyoung Mu Lee

TL;DR

OmniSplat addresses the challenge of producing high-quality, view-consistent 3D reconstructions from sparse omnidirectional images without per-scene optimization. It achieves this by projecting omnidirectional data onto quasi-uniform Yin-Yang grids, applying cross-view attention across Yin and Yang domains to estimate 3D Gaussian splats, and rendering with a Yin-Yang rasterizer that minimizes pole artifacts. The approach significantly outperforms perspective-trained feed-forward methods and optimization-based baselines in novel-view synthesis while maintaining fast inference, with additional capabilities for fast, multiview-consistent segmentation and editing. This makes omnidirectional 3D scene synthesis more practical for VR/AR, robotics, and immersive visualization where quick, editable reconstructions are valuable.

Abstract

Feed-forward 3D Gaussian splatting (3DGS) models have gained significant popularity due to their ability to generate scenes immediately without needing per-scene optimization. Although omnidirectional images are becoming more popular since they reduce the computation required for image stitching to composite a holistic scene, existing feed-forward models are only designed for perspective images. The unique optical properties of omnidirectional images make it difficult for feature encoders to correctly understand the context of the image and make the Gaussian non-uniform in space, which hinders the image quality synthesized from novel views. We propose OmniSplat, a training-free fast feed-forward 3DGS generation framework for omnidirectional images. We adopt a Yin-Yang grid and decompose images based on it to reduce the domain gap between omnidirectional and perspective images. The Yin-Yang grid can use the existing CNN structure as it is, but its quasi-uniform characteristic allows the decomposed image to be similar to a perspective image, so it can exploit the strong prior knowledge of the learned feed-forward network. OmniSplat demonstrates higher reconstruction accuracy than existing feed-forward networks trained on perspective images. Our project page is available on: https://robot0321.github.io/omnisplat/index.html.

OmniSplat: Taming Feed-Forward 3D Gaussian Splatting for Omnidirectional Images with Editable Capabilities

TL;DR

OmniSplat addresses the challenge of producing high-quality, view-consistent 3D reconstructions from sparse omnidirectional images without per-scene optimization. It achieves this by projecting omnidirectional data onto quasi-uniform Yin-Yang grids, applying cross-view attention across Yin and Yang domains to estimate 3D Gaussian splats, and rendering with a Yin-Yang rasterizer that minimizes pole artifacts. The approach significantly outperforms perspective-trained feed-forward methods and optimization-based baselines in novel-view synthesis while maintaining fast inference, with additional capabilities for fast, multiview-consistent segmentation and editing. This makes omnidirectional 3D scene synthesis more practical for VR/AR, robotics, and immersive visualization where quick, editable reconstructions are valuable.

Abstract

Feed-forward 3D Gaussian splatting (3DGS) models have gained significant popularity due to their ability to generate scenes immediately without needing per-scene optimization. Although omnidirectional images are becoming more popular since they reduce the computation required for image stitching to composite a holistic scene, existing feed-forward models are only designed for perspective images. The unique optical properties of omnidirectional images make it difficult for feature encoders to correctly understand the context of the image and make the Gaussian non-uniform in space, which hinders the image quality synthesized from novel views. We propose OmniSplat, a training-free fast feed-forward 3DGS generation framework for omnidirectional images. We adopt a Yin-Yang grid and decompose images based on it to reduce the domain gap between omnidirectional and perspective images. The Yin-Yang grid can use the existing CNN structure as it is, but its quasi-uniform characteristic allows the decomposed image to be similar to a perspective image, so it can exploit the strong prior knowledge of the learned feed-forward network. OmniSplat demonstrates higher reconstruction accuracy than existing feed-forward networks trained on perspective images. Our project page is available on: https://robot0321.github.io/omnisplat/index.html.

Paper Structure

This paper contains 20 sections, 7 equations, 13 figures, 11 tables.

Figures (13)

  • Figure 1: PSNR-runtime trade-off. A chart of reconstruction PSNR-runtime trade-off in novel view image on OmniBlender choi2023balanced. OmniSplat shows the best trade-off compared to the original feed-forward networks for perspective images.
  • Figure 2: The overall process of OmniSplat. The two reference omnidirectional images are decomposed into Yin-Yang images, and the cross-view attention is conducted across grids along with epipolar lines to compose cost volume. The 3DGS parameters are estimated and Yin-Yang images are rasterized from the novel view. The two images are combined to synthesize the final omnidirectional image. In cross-view attention, we present red and yellow points and the corresponding sphere sweep curves with the same color. Each image performs cross-attention to the Yin-Yang images from other views, following geometric constraints.
  • Figure 3: Qualitative comparison. Novel view synthesized image examples in various datasets. Each scene is brought from OmniBlender, Ricoh360, and OmniPhotos, respectively. Best viewed when zoomed in.
  • Figure 4: Visualization of segment matching. We visualize the matched segment samples among the source (top) and the target (bottom) views. The stars in the image indicate query points for the user to segment objects containing the stars.
  • Figure A: PSNR-runtime trade-off. A chart of reconstruction performance-runtime trade-off in novel view image on OmniBlender choi2023balanced, including the ablation according to the number of optimizations.
  • ...and 8 more figures