Table of Contents
Fetching ...

Diffusion360: Seamless 360 Degree Panoramic Image Generation based on Diffusion Models

Mengyang Feng, Jinlin Liu, Miaomiao Cui, Xuansong Xie

TL;DR

The work tackles seamless $360^\circ\times 180^\circ$ panoramic image generation with diffusion models by introducing a circular blending mechanism that enforces left–right geometric continuity during denoising and VAE decoding. It presents two task frameworks: Text-to-360-Panoramas, built as a multi-stage pipeline combining DreamBooth-finetuned SUN360 models with SR modules like ControlNet-Tile and RealESRGAN, and Single-Image-to-360-Panoramas, using a ControlNet-Outpainting approach based on cube-map perspectives. Experimental results indicate improved border continuity and high-resolution outputs, while noting limitations in stylization flexibility due to the DreamBooth base model and residual artifacts from SR steps. The approach offers practical pathways to high-quality 360 panoramas with diffusion models, leveraging adaptive, architecture-aware blending and existing high-fidelity upscaling techniques to achieve seamless, wide-field imagery.

Abstract

This is a technical report on the 360-degree panoramic image generation task based on diffusion models. Unlike ordinary 2D images, 360-degree panoramic images capture the entire $360^\circ\times 180^\circ$ field of view. So the rightmost and the leftmost sides of the 360 panoramic image should be continued, which is the main challenge in this field. However, the current diffusion pipeline is not appropriate for generating such a seamless 360-degree panoramic image. To this end, we propose a circular blending strategy on both the denoising and VAE decoding stages to maintain the geometry continuity. Based on this, we present two models for \textbf{Text-to-360-panoramas} and \textbf{Single-Image-to-360-panoramas} tasks. The code has been released as an open-source project at \href{https://github.com/ArcherFMY/SD-T2I-360PanoImage}{https://github.com/ArcherFMY/SD-T2I-360PanoImage} and \href{https://www.modelscope.cn/models/damo/cv_diffusion_text-to-360panorama-image_generation/summary}{ModelScope}

Diffusion360: Seamless 360 Degree Panoramic Image Generation based on Diffusion Models

TL;DR

The work tackles seamless panoramic image generation with diffusion models by introducing a circular blending mechanism that enforces left–right geometric continuity during denoising and VAE decoding. It presents two task frameworks: Text-to-360-Panoramas, built as a multi-stage pipeline combining DreamBooth-finetuned SUN360 models with SR modules like ControlNet-Tile and RealESRGAN, and Single-Image-to-360-Panoramas, using a ControlNet-Outpainting approach based on cube-map perspectives. Experimental results indicate improved border continuity and high-resolution outputs, while noting limitations in stylization flexibility due to the DreamBooth base model and residual artifacts from SR steps. The approach offers practical pathways to high-quality 360 panoramas with diffusion models, leveraging adaptive, architecture-aware blending and existing high-fidelity upscaling techniques to achieve seamless, wide-field imagery.

Abstract

This is a technical report on the 360-degree panoramic image generation task based on diffusion models. Unlike ordinary 2D images, 360-degree panoramic images capture the entire field of view. So the rightmost and the leftmost sides of the 360 panoramic image should be continued, which is the main challenge in this field. However, the current diffusion pipeline is not appropriate for generating such a seamless 360-degree panoramic image. To this end, we propose a circular blending strategy on both the denoising and VAE decoding stages to maintain the geometry continuity. Based on this, we present two models for \textbf{Text-to-360-panoramas} and \textbf{Single-Image-to-360-panoramas} tasks. The code has been released as an open-source project at \href{https://github.com/ArcherFMY/SD-T2I-360PanoImage}{https://github.com/ArcherFMY/SD-T2I-360PanoImage} and \href{https://www.modelscope.cn/models/damo/cv_diffusion_text-to-360panorama-image_generation/summary}{ModelScope}
Paper Structure (7 sections, 7 figures)

This paper contains 7 sections, 7 figures.

Figures (7)

  • Figure 1: The circular blending operation in different stages.
  • Figure 2: The pipeline of Text-to-360-Panoramas.
  • Figure 3: Results from the Base Model.
  • Figure 4: Results from Base+InitSR.
  • Figure 5: Results from Base+InitSR+ReslESRGAN. It can be observed that, the geometric continuity of the rightmost and the leftmost sides of our results are smooth and nearly no cracks. Some artifacts in the top two rows are cost by the RealESRGAN.
  • ...and 2 more figures