Table of Contents
Fetching ...

PanSplat: 4K Panorama Synthesis with Feed-Forward Gaussian Splatting

Cheng Zhang, Haofei Xu, Qianyi Wu, Camilo Cruz Gambardella, Dinh Phung, Jianfei Cai

TL;DR

PanSplat tackles high-resolution panorama synthesis by extending feed-forward 3D Gaussian splatting to spherical panorama geometry. It introduces a spherical 3D Gaussian pyramid placed on a Fibonacci lattice, a hierarchical spherical cost volume with transformer-backed feature extraction, and Gaussian heads that predict multi-scale Gaussian parameters, rendered via a cubemap pipeline. A two-step deferred backpropagation strategy and deferred blending address memory constraints, enabling 4K ($2048 \times 4096$) synthesis on a single A100 GPU and achieving state-of-the-art results with substantial speedups over prior methods. The approach yields sharp, high-frequency textures and improved geometry, demonstrating strong generalization to real-world data and broad VR-relevant applications, though dynamic scenes remain future work.

Abstract

With the advent of portable 360° cameras, panorama has gained significant attention in applications like virtual reality (VR), virtual tours, robotics, and autonomous driving. As a result, wide-baseline panorama view synthesis has emerged as a vital task, where high resolution, fast inference, and memory efficiency are essential. Nevertheless, existing methods are typically constrained to lower resolutions (512 $\times$ 1024) due to demanding memory and computational requirements. In this paper, we present PanSplat, a generalizable, feed-forward approach that efficiently supports resolution up to 4K (2048 $\times$ 4096). Our approach features a tailored spherical 3D Gaussian pyramid with a Fibonacci lattice arrangement, enhancing image quality while reducing information redundancy. To accommodate the demands of high resolution, we propose a pipeline that integrates a hierarchical spherical cost volume and Gaussian heads with local operations, enabling two-step deferred backpropagation for memory-efficient training on a single A100 GPU. Experiments demonstrate that PanSplat achieves state-of-the-art results with superior efficiency and image quality across both synthetic and real-world datasets. Code is available at https://github.com/chengzhag/PanSplat.

PanSplat: 4K Panorama Synthesis with Feed-Forward Gaussian Splatting

TL;DR

PanSplat tackles high-resolution panorama synthesis by extending feed-forward 3D Gaussian splatting to spherical panorama geometry. It introduces a spherical 3D Gaussian pyramid placed on a Fibonacci lattice, a hierarchical spherical cost volume with transformer-backed feature extraction, and Gaussian heads that predict multi-scale Gaussian parameters, rendered via a cubemap pipeline. A two-step deferred backpropagation strategy and deferred blending address memory constraints, enabling 4K () synthesis on a single A100 GPU and achieving state-of-the-art results with substantial speedups over prior methods. The approach yields sharp, high-frequency textures and improved geometry, demonstrating strong generalization to real-world data and broad VR-relevant applications, though dynamic scenes remain future work.

Abstract

With the advent of portable 360° cameras, panorama has gained significant attention in applications like virtual reality (VR), virtual tours, robotics, and autonomous driving. As a result, wide-baseline panorama view synthesis has emerged as a vital task, where high resolution, fast inference, and memory efficiency are essential. Nevertheless, existing methods are typically constrained to lower resolutions (512 1024) due to demanding memory and computational requirements. In this paper, we present PanSplat, a generalizable, feed-forward approach that efficiently supports resolution up to 4K (2048 4096). Our approach features a tailored spherical 3D Gaussian pyramid with a Fibonacci lattice arrangement, enhancing image quality while reducing information redundancy. To accommodate the demands of high resolution, we propose a pipeline that integrates a hierarchical spherical cost volume and Gaussian heads with local operations, enabling two-step deferred backpropagation for memory-efficient training on a single A100 GPU. Experiments demonstrate that PanSplat achieves state-of-the-art results with superior efficiency and image quality across both synthetic and real-world datasets. Code is available at https://github.com/chengzhag/PanSplat.

Paper Structure

This paper contains 20 sections, 4 equations, 18 figures, 5 tables.

Figures (18)

  • Figure 1: Our PanSplat can generate novel views from two 4K (2048 $\times$ 4096) panoramas. We train on rendered Matterport3D chang2017matterport3d data at 4K resolution (left) and can generalize to 4K real-world data (right) with a few fine-tunings on 360Loc huang2024360loc data (Zoom in for details). Please refer to the supplementary video for more results.
  • Figure 2: Fibonacci Gaussians. We propose a Fibonacci lattice arrangement for the Gaussians to be distributed uniformly across the sphere, avoiding information redundancy near the poles, and significantly reducing the number of required Gaussians.
  • Figure 3: Our proposed PanSplat pipeline. Given two wide-baseline panoramas, we first construct a hierarchical spherical cost volume (\ref{['sec:cost_volume']}) using a Transformer-based FPN to extract feature pyramid and 2D U-Nets to integrate monocular depth priors for cost volume refinement. We then build Gaussian heads (\ref{['sec:gaussian_pred']}) to generate a feature pyramid, which is later sampled with Fibonacci lattice and transformed to spherical 3D Gaussian pyramid (\ref{['sec:s3dgp']}). Finally, we unproject the Gaussian parameters for each level and view, consolidate them into a global representation, and splat it into novel views using a cubemap renderer. For simplicity, intermediate results of only a single view are shown.
  • Figure 4: Qualitative comparisons on synthetic datasets. We show the input panorama pairs and the ground truth novel views on the left, and compare the zoomed-in results on the right to highlight the differences. Our PanSplat generates overall sharper images with more high-frequency details and improved geometry.
  • Figure 5: Qualitative comparisons of ablation study. Our Fibonacci Gaussians (+Fibo) reduces Gaussian count without compromising image quality, and our 3D Gaussian Pyramid (+3DGP) further enhances quality.
  • ...and 13 more figures