360DVD: Controllable Panorama Video Generation with 360-Degree Video Diffusion Model
Qian Wang, Weiqi Li, Chong Mou, Xinhua Cheng, Jian Zhang
TL;DR
The paper addresses the challenge of generating high-quality 360° panorama videos without expensive capture by introducing 360DVD, a diffusion-based pipeline that repurposes pre-trained text-to-video models through a lightweight 360-Adapter. It leverages a new WEB360 dataset and a 360 Text Fusion captioning strategy to train the model on panorama-specific content, with enhancement techniques such as latitude-aware loss and wraparound-consistent mechanisms to improve continuity and motion fidelity. The results show that 360DVD produces text-aligned, coherent panorama videos across multiple styles and can follow motion guidance from optical flow while maintaining content distribution consistent with real panoramas. This work enables versatile, prompt-driven panorama video generation suitable for VR, entertainment, and educational applications, while preserving priors learned by large diffusion models and allowing easy adaptation to personalized T2I models.
Abstract
Panorama video recently attracts more interest in both study and application, courtesy of its immersive experience. Due to the expensive cost of capturing 360-degree panoramic videos, generating desirable panorama videos by prompts is urgently required. Lately, the emerging text-to-video (T2V) diffusion methods demonstrate notable effectiveness in standard video generation. However, due to the significant gap in content and motion patterns between panoramic and standard videos, these methods encounter challenges in yielding satisfactory 360-degree panoramic videos. In this paper, we propose a pipeline named 360-Degree Video Diffusion model (360DVD) for generating 360-degree panoramic videos based on the given prompts and motion conditions. Specifically, we introduce a lightweight 360-Adapter accompanied by 360 Enhancement Techniques to transform pre-trained T2V models for panorama video generation. We further propose a new panorama dataset named WEB360 consisting of panoramic video-text pairs for training 360DVD, addressing the absence of captioned panoramic video datasets. Extensive experiments demonstrate the superiority and effectiveness of 360DVD for panorama video generation. Our project page is at https://akaneqwq.github.io/360DVD/.
