Table of Contents
Fetching ...

CylinderSplat: 3D Gaussian Splatting with Cylindrical Triplanes for Panoramic Novel View Synthesis

Qiwei Wang, Xianghui Ze, Jingyi Yu, Yujiao Shi

TL;DR

This work introduces CylinderSplat, a feed-forward framework for panoramic 3DGS that achieves state-of-the-art results in both single-view and multi-view panoramic novel view synthesis, outperforming previous methods in both reconstruction quality and geometric accuracy.

Abstract

Feed-forward 3D Gaussian Splatting (3DGS) has shown great promise for real-time novel view synthesis, but its application to panoramic imagery remains challenging. Existing methods often rely on multi-view cost volumes for geometric refinement, which struggle to resolve occlusions in sparse-view scenarios. Furthermore, standard volumetric representations like Cartesian Triplanes are poor in capturing the inherent geometry of $360^\circ$ scenes, leading to distortion and aliasing. In this work, we introduce CylinderSplat, a feed-forward framework for panoramic 3DGS that addresses these limitations. The core of our method is a new {cylindrical Triplane} representation, which is better aligned with panoramic data and real-world structures adhering to the Manhattan-world assumption. We use a dual-branch architecture: a pixel-based branch reconstructs well-observed regions, while a volume-based branch leverages the cylindrical Triplane to complete occluded or sparsely-viewed areas. Our framework is designed to flexibly handle a variable number of input views, from single to multiple panoramas. Extensive experiments demonstrate that CylinderSplat achieves state-of-the-art results in both single-view and multi-view panoramic novel view synthesis, outperforming previous methods in both reconstruction quality and geometric accuracy.

CylinderSplat: 3D Gaussian Splatting with Cylindrical Triplanes for Panoramic Novel View Synthesis

TL;DR

This work introduces CylinderSplat, a feed-forward framework for panoramic 3DGS that achieves state-of-the-art results in both single-view and multi-view panoramic novel view synthesis, outperforming previous methods in both reconstruction quality and geometric accuracy.

Abstract

Feed-forward 3D Gaussian Splatting (3DGS) has shown great promise for real-time novel view synthesis, but its application to panoramic imagery remains challenging. Existing methods often rely on multi-view cost volumes for geometric refinement, which struggle to resolve occlusions in sparse-view scenarios. Furthermore, standard volumetric representations like Cartesian Triplanes are poor in capturing the inherent geometry of scenes, leading to distortion and aliasing. In this work, we introduce CylinderSplat, a feed-forward framework for panoramic 3DGS that addresses these limitations. The core of our method is a new {cylindrical Triplane} representation, which is better aligned with panoramic data and real-world structures adhering to the Manhattan-world assumption. We use a dual-branch architecture: a pixel-based branch reconstructs well-observed regions, while a volume-based branch leverages the cylindrical Triplane to complete occluded or sparsely-viewed areas. Our framework is designed to flexibly handle a variable number of input views, from single to multiple panoramas. Extensive experiments demonstrate that CylinderSplat achieves state-of-the-art results in both single-view and multi-view panoramic novel view synthesis, outperforming previous methods in both reconstruction quality and geometric accuracy.
Paper Structure (33 sections, 10 equations, 12 figures, 16 tables)

This paper contains 33 sections, 10 equations, 12 figures, 16 tables.

Figures (12)

  • Figure 1: This paper introduces CylinderSplat, a feed-forward panoramic 3D Gaussian Splatting (3DGS) framework for panoramic novel view synthesis from single (left) or sparse (right) input views.
  • Figure 2: Visualization of the Triplane representation in (a) Cartesian, (b) Spherical, and (c) Cylindrical coordinate systems. (d) The corresponding unit volume elements for each system.
  • Figure 3: Overview of our CylinderSplat framework. Our method uses a dual-branch architecture trained via a three-stage curriculum. The pixel branch uses a multi-view attention mechanism to generate high-quality Gaussians for well-observed regions. The volume branch is designed to fill the gaps by lifting features into our cylindrical triplane representation, thereby completing the scene geometry robustly. The outputs from both branches are then unified for a final render.
  • Figure 4: Qualitative comparison of 360Loc (two-view). Left: input/target views. Right: zoomed-in novel views and depth maps (warm = far, cool = near). Our ground truth (GT) depth is obtained from DepthAnywhere wang2024depth, which serves as a reference for calculating PCC.
  • Figure 5: Ablation study visualizations on Matterport3D (leftmost column is ground truth). Among the different coordinate systems for the Triplane, the Cartesian version shows significant distortion, and the spherical version struggles with distant rooms. Our cylindrical Triplane performs best. While the pixel branch is high-quality in visible areas, combining it with the Triplane branch improves reconstruction of distant regions.
  • ...and 7 more figures