Imagine360: Immersive 360 Video Generation from Perspective Anchor
Jing Tan, Shuai Yang, Tong Wu, Jingwen He, Yuwei Guo, Ziwei Liu, Dahua Lin
TL;DR
Imagine360 tackles generating high-quality 360° videos from perspective anchors by bridging the perspective and panorama domains through a dual-branch diffusion framework, augmented with cross-domain spherical attention and an antipodal mask to capture long-range panoramic motion. Elevation-aware training and inference components address varying input elevations, while resource-efficient fine-tuning (LoRA) enables learning from limited panorama data. The approach is validated with extensive data collection (10,744 samples from WEB360 and YouTube), quantitative metrics (Vbench and Q-Align) and human studies, and demonstrates superior performance in both panorama video quality and panorama image outpainting. Overall, Imagine360 advances personalized, immersive 360° video creation by effectively transferring local perspective content into globally coherent spherical motion.
Abstract
$360^\circ$ videos offer a hyper-immersive experience that allows the viewers to explore a dynamic scene from full 360 degrees. To achieve more user-friendly and personalized content creation in $360^\circ$ video format, we seek to lift standard perspective videos into $360^\circ$ equirectangular videos. To this end, we introduce Imagine360, the first perspective-to-$360^\circ$ video generation framework that creates high-quality $360^\circ$ videos with rich and diverse motion patterns from video anchors. Imagine360 learns fine-grained spherical visual and motion patterns from limited $360^\circ$ video data with several key designs. 1) Firstly we adopt the dual-branch design, including a perspective and a panorama video denoising branch to provide local and global constraints for $360^\circ$ video generation, with motion module and spatial LoRA layers fine-tuned on extended web $360^\circ$ videos. 2) Additionally, an antipodal mask is devised to capture long-range motion dependencies, enhancing the reversed camera motion between antipodal pixels across hemispheres. 3) To handle diverse perspective video inputs, we propose elevation-aware designs that adapt to varying video masking due to changing elevations across frames. Extensive experiments show Imagine360 achieves superior graphics quality and motion coherence among state-of-the-art $360^\circ$ video generation methods. We believe Imagine360 holds promise for advancing personalized, immersive $360^\circ$ video creation.
