SphereDiff: Tuning-free 360° Static and Dynamic Panorama Generation via Spherical Latent Representation
Minho Park, Taewoong Kang, Jooyeol Yun, Sungwon Hwang, Jaegul Choo
TL;DR
SphereDiff tackles the distortions inherent in 360° panorama generation by introducing a spherical latent representation distributed via a Fibonacci lattice, ensuring uniform coverage and consistent quality across all view directions including poles. By extending MultiDiffusion to operate in this spherical latent space, applying dynamic latent sampling to map latents to perspective views, and employing distortion-aware weighted averaging and multi-prompt inference, SphereDiff achieves tuning-free generation of high-quality static and live 360° wallpapers. It demonstrates state-of-the-art performance on panoramic criteria (distortion and end continuity) while remaining compatible with different diffusion backbones, highlighting its potential as a robust foundation for immersive AR/VR content. Limitations include indoor scene generation and data constraints for panoramic video, with runtime and diffusion backbone considerations noted, pointing to future improvements through stronger backbones and broader datasets.
Abstract
The increasing demand for AR/VR applications has highlighted the need for high-quality content, such as 360° live wallpapers. However, generating high-quality 360° panoramic contents remains a challenging task due to the severe distortions introduced by equirectangular projection (ERP). Existing approaches either fine-tune pretrained diffusion models on limited ERP datasets or adopt tuning-free methods that still rely on ERP latent representations, often resulting in distracting distortions near the poles. In this paper, we introduce SphereDiff, a novel approach for synthesizing 360° static and live wallpaper with state-of-the-art diffusion models without additional tuning. We define a spherical latent representation that ensures consistent quality across all perspectives, including near the poles. Then, we extend MultiDiffusion to spherical latent representation and propose a dynamic spherical latent sampling method to enable direct use of pretrained diffusion models. Moreover, we introduce distortion-aware weighted averaging to further improve the generation quality. Our method outperforms existing approaches in generating 360° static and live wallpaper, making it a robust solution for immersive AR/VR applications. The code is available here. https://github.com/pmh9960/SphereDiff
