DynamicScaler: Seamless and Scalable Video Generation for Panoramic Scenes
Jinxiu Liu, Shaoheng Lin, Yinxiao Li, Ming-Hsuan Yang
TL;DR
DynamicScaler tackles the challenge of generating coherent, long-duration panoramic videos with arbitrary resolutions and aspect ratios without model fine-tuning. It introduces Offset Shifting Denoising to evenly denoise a full panorama via shifting windows, and Global Motion Guidance to preserve global motion structure while refining local details through a hierarchical upsampling path. A Panoramic Projection Denoiser enables efficient 360° FoV outputs by projecting ERP latents into perspective viewports for denoising, with spherical mappings to maintain geometric fidelity. Temporal extension further yields long-duration and loopable panoramic videos, overcoming memory constraints of standard diffusion models. Overall, the method demonstrates superior visual quality, motion coherence, and scalability for immersive AR/VR content generation.
Abstract
The increasing demand for immersive AR/VR applications and spatial intelligence has heightened the need to generate high-quality scene-level and 360$°$ panoramic video. However, most video diffusion models are constrained by limited resolution and aspect ratio, which restricts their applicability to scene-level dynamic content synthesis. In this work, we propose $\textbf{DynamicScaler}$, addressing these challenges by enabling spatially scalable and panoramic dynamic scene synthesis that preserves coherence across panoramic scenes of arbitrary size. Specifically, we introduce a Offset Shifting Denoiser, facilitating efficient, synchronous, and coherent denoising panoramic dynamic scenes via a diffusion model with fixed resolution through a seamless rotating Window, which ensures seamless boundary transitions and consistency across the entire panoramic space, accommodating varying resolutions and aspect ratios. Additionally, we employ a Global Motion Guidance mechanism to ensure both local detail fidelity and global motion continuity. Extensive experiments demonstrate our method achieves superior content and motion quality in panoramic scene-level video generation, offering a training-free, efficient, and scalable solution for immersive dynamic scene creation with constant VRAM consumption regardless of the output video resolution. Project page is available at $\href{https://dynamic-scaler.pages.dev/new}{https://dynamic-scaler.pages.dev/new}$.
