PanoWan: Lifting Diffusion Video Generation Models to 360° with Latitude/Longitude-aware Mechanisms

Yifei Xia; Shuchen Weng; Siqi Yang; Jingqi Liu; Chengxuan Zhu; Minggui Teng; Zijian Jia; Han Jiang; Boxin Shi

PanoWan: Lifting Diffusion Video Generation Models to 360° with Latitude/Longitude-aware Mechanisms

Yifei Xia, Shuchen Weng, Siqi Yang, Jingqi Liu, Chengxuan Zhu, Minggui Teng, Zijian Jia, Han Jiang, Boxin Shi

TL;DR

PanoWan addresses the challenge of generating high-quality 360° panoramic videos by lifting priors from a pre-trained text-to-video diffusion model to the panorama with minimal, efficient modules. It introduces latitude-aware sampling to mitigate latitudinal distortion, rotated semantic denoising to achieve seamless longitude transitions, and padded pixel-wise decoding to reduce boundary artifacts, all supported by the newly released PanoVid dataset. The approach achieves state-of-the-art panoramic generation metrics and robust zero-shot performance on downstream tasks, while enabling practical editing and long-video generation. This work narrows the gap between conventional priors and panoramic geometry, enabling scalable, coherent 360° content creation from text descriptions.

Abstract

Panoramic video generation enables immersive 360° content creation, valuable in applications that demand scene-consistent world exploration. However, existing panoramic video generation models struggle to leverage pre-trained generative priors from conventional text-to-video models for high-quality and diverse panoramic videos generation, due to limited dataset scale and the gap in spatial feature representations. In this paper, we introduce PanoWan to effectively lift pre-trained text-to-video models to the panoramic domain, equipped with minimal modules. PanoWan employs latitude-aware sampling to avoid latitudinal distortion, while its rotated semantic denoising and padded pixel-wise decoding ensure seamless transitions at longitude boundaries. To provide sufficient panoramic videos for learning these lifted representations, we contribute PanoVid, a high-quality panoramic video dataset with captions and diverse scenarios. Consequently, PanoWan achieves state-of-the-art performance in panoramic video generation and demonstrates robustness for zero-shot downstream tasks. Our project page is available at https://panowan.variantconst.com.

PanoWan: Lifting Diffusion Video Generation Models to 360° with Latitude/Longitude-aware Mechanisms

TL;DR

Abstract

PanoWan: Lifting Diffusion Video Generation Models to 360° with Latitude/Longitude-aware Mechanisms

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (9)