Table of Contents
Fetching ...

Imagine360: Immersive 360 Video Generation from Perspective Anchor

Jing Tan, Shuai Yang, Tong Wu, Jingwen He, Yuwei Guo, Ziwei Liu, Dahua Lin

TL;DR

Imagine360 tackles generating high-quality 360° videos from perspective anchors by bridging the perspective and panorama domains through a dual-branch diffusion framework, augmented with cross-domain spherical attention and an antipodal mask to capture long-range panoramic motion. Elevation-aware training and inference components address varying input elevations, while resource-efficient fine-tuning (LoRA) enables learning from limited panorama data. The approach is validated with extensive data collection (10,744 samples from WEB360 and YouTube), quantitative metrics (Vbench and Q-Align) and human studies, and demonstrates superior performance in both panorama video quality and panorama image outpainting. Overall, Imagine360 advances personalized, immersive 360° video creation by effectively transferring local perspective content into globally coherent spherical motion.

Abstract

$360^\circ$ videos offer a hyper-immersive experience that allows the viewers to explore a dynamic scene from full 360 degrees. To achieve more user-friendly and personalized content creation in $360^\circ$ video format, we seek to lift standard perspective videos into $360^\circ$ equirectangular videos. To this end, we introduce Imagine360, the first perspective-to-$360^\circ$ video generation framework that creates high-quality $360^\circ$ videos with rich and diverse motion patterns from video anchors. Imagine360 learns fine-grained spherical visual and motion patterns from limited $360^\circ$ video data with several key designs. 1) Firstly we adopt the dual-branch design, including a perspective and a panorama video denoising branch to provide local and global constraints for $360^\circ$ video generation, with motion module and spatial LoRA layers fine-tuned on extended web $360^\circ$ videos. 2) Additionally, an antipodal mask is devised to capture long-range motion dependencies, enhancing the reversed camera motion between antipodal pixels across hemispheres. 3) To handle diverse perspective video inputs, we propose elevation-aware designs that adapt to varying video masking due to changing elevations across frames. Extensive experiments show Imagine360 achieves superior graphics quality and motion coherence among state-of-the-art $360^\circ$ video generation methods. We believe Imagine360 holds promise for advancing personalized, immersive $360^\circ$ video creation.

Imagine360: Immersive 360 Video Generation from Perspective Anchor

TL;DR

Imagine360 tackles generating high-quality 360° videos from perspective anchors by bridging the perspective and panorama domains through a dual-branch diffusion framework, augmented with cross-domain spherical attention and an antipodal mask to capture long-range panoramic motion. Elevation-aware training and inference components address varying input elevations, while resource-efficient fine-tuning (LoRA) enables learning from limited panorama data. The approach is validated with extensive data collection (10,744 samples from WEB360 and YouTube), quantitative metrics (Vbench and Q-Align) and human studies, and demonstrates superior performance in both panorama video quality and panorama image outpainting. Overall, Imagine360 advances personalized, immersive 360° video creation by effectively transferring local perspective content into globally coherent spherical motion.

Abstract

videos offer a hyper-immersive experience that allows the viewers to explore a dynamic scene from full 360 degrees. To achieve more user-friendly and personalized content creation in video format, we seek to lift standard perspective videos into equirectangular videos. To this end, we introduce Imagine360, the first perspective-to- video generation framework that creates high-quality videos with rich and diverse motion patterns from video anchors. Imagine360 learns fine-grained spherical visual and motion patterns from limited video data with several key designs. 1) Firstly we adopt the dual-branch design, including a perspective and a panorama video denoising branch to provide local and global constraints for video generation, with motion module and spatial LoRA layers fine-tuned on extended web videos. 2) Additionally, an antipodal mask is devised to capture long-range motion dependencies, enhancing the reversed camera motion between antipodal pixels across hemispheres. 3) To handle diverse perspective video inputs, we propose elevation-aware designs that adapt to varying video masking due to changing elevations across frames. Extensive experiments show Imagine360 achieves superior graphics quality and motion coherence among state-of-the-art video generation methods. We believe Imagine360 holds promise for advancing personalized, immersive video creation.

Paper Structure

This paper contains 26 sections, 2 equations, 9 figures, 7 tables.

Figures (9)

  • Figure 1: Overview of Imagine360. Imagine360 lifts standard perspective video into $360^\circ$ video, enabling dynamic scene experience from full 360 degrees. Compared to Follow-Your-Canvas, which focuses only on perspective visual and motion patterns, our approach achieves more plausible spherical video patterns. Best viewed with Acrobat Reader for the animated 360 videos.
  • Figure 2: Pipeline of Imagine360. Given perspective anchor video guidance, Imagine360 leverages a dual-branch video noising structure, with parallelled panorama and perspective branches to denoise 360° videos with plausible panoramic patterns. Additionally, we devise the cross-domain spherical attention with antipodal masking to capture long-range dependencies for reversed antipodal motion. Finally, we introduce elevation-aware designs to handle diverse video inputs of changing elevations.
  • Figure 3: Cross-domain Spherical Attention (perspective branch) highlights interaction for direct-mapped pixels (spherical mask) and antipodal-mapped pixels (antipodal mask) between panorama and perspective domains.
  • Figure 4: Elevation-aware sampling augments the training samples with diverse elevation trajectories.
  • Figure 5: Qualitative comparisons on 360° video generations among state-of-the-art methods. Imagine360 generates 360° video generation with superior visual quality and plausible panoramic patterns.
  • ...and 4 more figures