Table of Contents
Fetching ...

4K4DGen: Panoramic 4D Generation at 4K Resolution

Renjie Li, Panwang Pan, Bangbang Yang, Dejia Xu, Shijie Zhou, Xuanyang Zhang, Zeming Li, Achuta Kadambi, Zhangyang Wang, Zhengzhong Tu, Zhiwen Fan

TL;DR

4K4DGen tackles the absence of annotated 4D panoramic data by bridging 2D diffusion priors to omnidirectional content. It introduces a Panoramic Denoiser to coherently animate 360° panoramas and a Dynamic Panoramic Lifting stage that converts the resulting panoramic video into a 4D Gaussian-based representation with spatial-temporal regularization. The method achieves 4K panorama-to-4D generation, enabling real-time, free-viewpoint exploration with improved temporal and spatial coherence, validated through non-reference metrics and user studies. By leveraging spherical latent processing and Gaussian splatting, the approach enables high-fidelity, 4D immersive scenes from a single panorama, marking a first in 4K omnidirectional 4D generation without labeled 4D data.

Abstract

The blooming of virtual reality and augmented reality (VR/AR) technologies has driven an increasing demand for the creation of high-quality, immersive, and dynamic environments. However, existing generative techniques either focus solely on dynamic objects or perform outpainting from a single perspective image, failing to meet the requirements of VR/AR applications that need free-viewpoint, 360$^{\circ}$ virtual views where users can move in all directions. In this work, we tackle the challenging task of elevating a single panorama to an immersive 4D experience. For the first time, we demonstrate the capability to generate omnidirectional dynamic scenes with 360$^{\circ}$ views at 4K (4096 $\times$ 2048) resolution, thereby providing an immersive user experience. Our method introduces a pipeline that facilitates natural scene animations and optimizes a set of dynamic Gaussians using efficient splatting techniques for real-time exploration. To overcome the lack of scene-scale annotated 4D data and models, especially in panoramic formats, we propose a novel \textbf{Panoramic Denoiser} that adapts generic 2D diffusion priors to animate consistently in 360$^{\circ}$ images, transforming them into panoramic videos with dynamic scenes at targeted regions. Subsequently, we propose \textbf{Dynamic Panoramic Lifting} to elevate the panoramic video into a 4D immersive environment while preserving spatial and temporal consistency. By transferring prior knowledge from 2D models in the perspective domain to the panoramic domain and the 4D lifting with spatial appearance and geometry regularization, we achieve high-quality Panorama-to-4D generation at a resolution of 4K for the first time.

4K4DGen: Panoramic 4D Generation at 4K Resolution

TL;DR

4K4DGen tackles the absence of annotated 4D panoramic data by bridging 2D diffusion priors to omnidirectional content. It introduces a Panoramic Denoiser to coherently animate 360° panoramas and a Dynamic Panoramic Lifting stage that converts the resulting panoramic video into a 4D Gaussian-based representation with spatial-temporal regularization. The method achieves 4K panorama-to-4D generation, enabling real-time, free-viewpoint exploration with improved temporal and spatial coherence, validated through non-reference metrics and user studies. By leveraging spherical latent processing and Gaussian splatting, the approach enables high-fidelity, 4D immersive scenes from a single panorama, marking a first in 4K omnidirectional 4D generation without labeled 4D data.

Abstract

The blooming of virtual reality and augmented reality (VR/AR) technologies has driven an increasing demand for the creation of high-quality, immersive, and dynamic environments. However, existing generative techniques either focus solely on dynamic objects or perform outpainting from a single perspective image, failing to meet the requirements of VR/AR applications that need free-viewpoint, 360 virtual views where users can move in all directions. In this work, we tackle the challenging task of elevating a single panorama to an immersive 4D experience. For the first time, we demonstrate the capability to generate omnidirectional dynamic scenes with 360 views at 4K (4096 2048) resolution, thereby providing an immersive user experience. Our method introduces a pipeline that facilitates natural scene animations and optimizes a set of dynamic Gaussians using efficient splatting techniques for real-time exploration. To overcome the lack of scene-scale annotated 4D data and models, especially in panoramic formats, we propose a novel \textbf{Panoramic Denoiser} that adapts generic 2D diffusion priors to animate consistently in 360 images, transforming them into panoramic videos with dynamic scenes at targeted regions. Subsequently, we propose \textbf{Dynamic Panoramic Lifting} to elevate the panoramic video into a 4D immersive environment while preserving spatial and temporal consistency. By transferring prior knowledge from 2D models in the perspective domain to the panoramic domain and the 4D lifting with spatial appearance and geometry regularization, we achieve high-quality Panorama-to-4D generation at a resolution of 4K for the first time.
Paper Structure (43 sections, 6 equations, 7 figures, 2 tables)

This paper contains 43 sections, 6 equations, 7 figures, 2 tables.

Figures (7)

  • Figure 1: 4K4DGen takes a static panoramic image with a resolution of 4096$\times$2048 and allows animation through user interaction or an input mask, transforming the static panorama into dynamic Gaussian Splatting. 4K4DGen supports the rendering of novel views at various timestamps, enriching immersive virtual exploration.
  • Figure 2: Panoramic Denoiser adapts diffusion priors from the perspective domain to the panoramic domain by simultaneously denoising perspective views and integrating them into spherical latents at each denoising step. This approach ensures consistent animation across multiple views.
  • Figure 3: Overall Pipeline. Beginning with a static panorama as input, the Animating Phase generates a panoramic video by first mapping the panorama into a spherical latent space, followed by denoising within the perspective space, fusing back to the spherical latent space at each step, and finally transforming it into the panoramic space. In the 4D Lifting Phase, a series of dynamic Gaussians is employed to lift the panoramic video into a 4D representation, ensuring both spatial and temporal consistency.
  • Figure 4: Comparison between 4K4DGen and 3D-Cinemagraphy. We present the input static panorama (Pano RGB), the corresponding text prompts, and the rendered results from different views and at various timestamps. 4K4DGen (Ours) effectively generates 4D scenes that are both spatially and temporally consistent, while 3D-Cinemagraphy (3D-Cin.) suffers from ghosting artifacts in the middle frames.
  • Figure 5: Comparison to Different Animators: Animators trained primarily on perspective images tend to produce limited motion when applied to panoramas, and the resolution may be limited. On the other hand, animating perspective images individually can lead to inconsistencies between overlapping views.
  • ...and 2 more figures