Table of Contents
Fetching ...

PanoDreamer: Optimization-Based Single Image to 360 3D Scene With Diffusion

Avinash Paliwal, Xilong Zhou, Andrii Tsarov, Nima Khademi Kalantari

TL;DR

PanoDreamer tackles the challenge of producing coherent 360° 3D scenes from a single image by decoupling panorama generation from depth estimation and solving both as coupled optimization problems via alternating minimization. The pipeline then inpaints occluded areas and reconstructs the scene with a 3D Gaussian splatting representation, using a four-layer LDI for depth-aware texture completion. Key contributions include a two-stage panorama-depth optimization (MultiConDiffusion) and a patch-wise panorama depth fusion (PanoDepthFusion), followed by an end-to-end 3DGS optimization with depth-guided losses. The method achieves superior global coherence and detail in wide-view renderings, with practical impact for VR/AR and immersive visualization, while acknowledging horizon-bounded inputs and occasional edge-blur artifacts as avenues for future refinement.

Abstract

In this paper, we present PanoDreamer, a novel method for producing a coherent 360° 3D scene from a single input image. Unlike existing methods that generate the scene sequentially, we frame the problem as single-image panorama and depth estimation. Once the coherent panoramic image and its corresponding depth are obtained, the scene can be reconstructed by inpainting the small occluded regions and projecting them into 3D space. Our key contribution is formulating single-image panorama and depth estimation as two optimization tasks and introducing alternating minimization strategies to effectively solve their objectives. We demonstrate that our approach outperforms existing techniques in single-image 360° 3D scene reconstruction in terms of consistency and overall quality.

PanoDreamer: Optimization-Based Single Image to 360 3D Scene With Diffusion

TL;DR

PanoDreamer tackles the challenge of producing coherent 360° 3D scenes from a single image by decoupling panorama generation from depth estimation and solving both as coupled optimization problems via alternating minimization. The pipeline then inpaints occluded areas and reconstructs the scene with a 3D Gaussian splatting representation, using a four-layer LDI for depth-aware texture completion. Key contributions include a two-stage panorama-depth optimization (MultiConDiffusion) and a patch-wise panorama depth fusion (PanoDepthFusion), followed by an end-to-end 3DGS optimization with depth-guided losses. The method achieves superior global coherence and detail in wide-view renderings, with practical impact for VR/AR and immersive visualization, while acknowledging horizon-bounded inputs and occasional edge-blur artifacts as avenues for future refinement.

Abstract

In this paper, we present PanoDreamer, a novel method for producing a coherent 360° 3D scene from a single input image. Unlike existing methods that generate the scene sequentially, we frame the problem as single-image panorama and depth estimation. Once the coherent panoramic image and its corresponding depth are obtained, the scene can be reconstructed by inpainting the small occluded regions and projecting them into 3D space. Our key contribution is formulating single-image panorama and depth estimation as two optimization tasks and introducing alternating minimization strategies to effectively solve their objectives. We demonstrate that our approach outperforms existing techniques in single-image 360° 3D scene reconstruction in terms of consistency and overall quality.

Paper Structure

This paper contains 24 sections, 8 equations, 16 figures, 4 tables.

Figures (16)

  • Figure 1: We provide an overview of our proposed MultiConDiffusion process, which consists of two stages. In the first stage, we fix the input condition $L$ and apply the diffusion model to overlapping crops of the image at the current time step. The outputs are then aggregated to produce the image at the next time step. This process is repeated until the fully denoised image $J_0$ is obtained. In the next stage, we replace the current input condition with $J_0$. These two stages are repeated until convergence.
  • Figure 2: We compare the results of our MultiConDiffusion process against MultiDiffusion and progressive inpainting. The green bar shows the location of the input image. We show MultiDifussion results with two different input conditions (shown on the top left): black canvas with input image (second row), and progressive inpainting result (third row). Our method produces coherent results, while the alternative approaches produce images with seams and inconsistencies.
  • Figure 3: We compare the result of our method, PanoDepthFusion, against applying Depth Anything V2 (DA V2) yang2024depthv2 on the full image. The results obtained by DA V2 lacks details and is geometrically inconsistent. Our approach, on the other hand, produces highly detailed and consistent depth maps.
  • Figure 4: Averaging the patch depth estimates leads to banding artifacts since the depth maps are relative and not consistent. On the top right, we show that projecting the image into 3D using such a depth map results in clear banding artifacts. Since we initialize $G_{\theta_i}$ with the identity line, the patchwise average serves as our initial depth estimate during the optimization of Eq. \ref{['eq:pano_depth']}. We also show our results after one and four iterations of optimization. After only four iterations, the seams disappear. As seen on the bottom right, the banding artifacts also disappear from the projected image.
  • Figure 5: We show the overview of PanoDepthFusion. We first apply an existing depth estimator to the overlapping patches of the input image to obtain a set of patch depth estimates. We then perform optimization in two stages. In the first stage, the depth patches are adjusted using a piecewise linear function $G_{\theta_i}$, and the adjusted patches are then aggregated to obtain the panoramic depth. In the second stage, we optimize the parameters $\theta_i$ of the parametric functions to match the adjusted patch depth estimates with the corresponding regions in the panoramic depth. These two steps are repeated until convergence.
  • ...and 11 more figures