Table of Contents
Fetching ...

PanFlow: Decoupled Motion Control for Panoramic Video Generation

Cheng Zhang, Hanwen Liang, Donny Y. Chen, Qianyi Wu, Konstantinos N. Plataniotis, Camilo Cruz Gambardella, Jianfei Cai

TL;DR

PanFlow tackles the challenge of motion control in panoramic video generation by decoupling camera rotation from derotated flow and introducing loop-consistent diffusion via spherical noise warping and latent rotation. It presents a decoupled motion framework that leverages spherical optical flow to separate rotation from translation/object motion, followed by inverse rotation to recover full motion. A motion-rich panoramic dataset with frame-level pose and flow annotations supports robust training and evaluation. Empirical results show PanFlow surpasses prior methods in motion fidelity, temporal coherence, and visual quality, and it demonstrates practical applications in motion transfer and video editing.

Abstract

Panoramic video generation has attracted growing attention due to its applications in virtual reality and immersive media. However, existing methods lack explicit motion control and struggle to generate scenes with large and complex motions. We propose PanFlow, a novel approach that exploits the spherical nature of panoramas to decouple the highly dynamic camera rotation from the input optical flow condition, enabling more precise control over large and dynamic motions. We further introduce a spherical noise warping strategy to promote loop consistency in motion across panorama boundaries. To support effective training, we curate a large-scale, motion-rich panoramic video dataset with frame-level pose and flow annotations. We also showcase the effectiveness of our method in various applications, including motion transfer and video editing. Extensive experiments demonstrate that PanFlow significantly outperforms prior methods in motion fidelity, visual quality, and temporal coherence. Our code, dataset, and models are available at https://github.com/chengzhag/PanFlow.

PanFlow: Decoupled Motion Control for Panoramic Video Generation

TL;DR

PanFlow tackles the challenge of motion control in panoramic video generation by decoupling camera rotation from derotated flow and introducing loop-consistent diffusion via spherical noise warping and latent rotation. It presents a decoupled motion framework that leverages spherical optical flow to separate rotation from translation/object motion, followed by inverse rotation to recover full motion. A motion-rich panoramic dataset with frame-level pose and flow annotations supports robust training and evaluation. Empirical results show PanFlow surpasses prior methods in motion fidelity, temporal coherence, and visual quality, and it demonstrates practical applications in motion transfer and video editing.

Abstract

Panoramic video generation has attracted growing attention due to its applications in virtual reality and immersive media. However, existing methods lack explicit motion control and struggle to generate scenes with large and complex motions. We propose PanFlow, a novel approach that exploits the spherical nature of panoramas to decouple the highly dynamic camera rotation from the input optical flow condition, enabling more precise control over large and dynamic motions. We further introduce a spherical noise warping strategy to promote loop consistency in motion across panorama boundaries. To support effective training, we curate a large-scale, motion-rich panoramic video dataset with frame-level pose and flow annotations. We also showcase the effectiveness of our method in various applications, including motion transfer and video editing. Extensive experiments demonstrate that PanFlow significantly outperforms prior methods in motion fidelity, visual quality, and temporal coherence. Our code, dataset, and models are available at https://github.com/chengzhag/PanFlow.

Paper Structure

This paper contains 37 sections, 4 equations, 7 figures, 2 tables.

Figures (7)

  • Figure 1: Spherical Camera Optical Flow. The optical flow from a panoramic video (left) can be interpreted as a spherical camera optical flow (right). For complex motion $\vec{f}$, the camera rotation yields an analytic rotation flow $\vec{f}_r$ on the sphere. By decomposing $\vec{f}$ into $\vec{f}_r$ and its residual, we obtain a derotated flow $\vec{f}_d$ that more clearly captures camera translation and object dynamics.
  • Figure 2: Our proposed PanFlow pipeline. Given an input image and text prompt, PanFlow uses a decoupled motion from a video as reference to generate a panoramic video. We first estimate a decoupled optical flow from the reference video, of which the derotated flow is used to generate a latent noise with spherical noise warping. The latent noise then serves as a motion condition for a video diffusion transformer with LoRA fine-tuning to generate derotated videos. Finally, the decoupled rotation is accumulated and applied to the generated video frames to recover the full motion.
  • Figure 3: Comparison with Panoramic Video Generation Methods. We compare our proposed PanFlow with the baselines conditioned on the same input images and text prompts (360DVD uses text prompts only). Generated frames are shown at the same timestamps. We highlight regions exhibiting more dynamic motion, high-fidelity textures, and consistent geometry.
  • Figure 4: Comparison with Motion-controlled Video Generation Methods. All the methods generate videos conditioned on the same input images and motion flows. PanFlow better follows the flow conditions and aligns more closely with the ground truth. We use rectangles to highlight regions with large motion. (Text prompts omitted without loss of generality.)
  • Figure 5: Ablation Study. Panoramas are horizontally rotated by 180 degrees to better visualize the seam. We zoom in on the seam to compare different loop consistency setups.
  • ...and 2 more figures