Table of Contents
Fetching ...

Dream360: Diverse and Immersive Outdoor Virtual Scene Creation via Transformer-Based 360 Image Outpainting

Hao Ai, Zidong Cao, Haonan Lu, Chen Chen, Jian Ma, Pengyuan Zhou, Tae-Kyun Kim, Pan Hui, Lin Wang

TL;DR

Dream360 tackles the challenge of generating diverse, high-fidelity 360° panoramas from narrow FoV viewports by introducing a two-stage transformer framework. It first learns sphere-aware codebooks via S-VQGAN using spherical harmonics, then refines outputs with a frequency-aware consistency loss to recover high-frequency details. The approach outperforms prior methods on outdoor panorama data and is validated by a VR user study with 15 participants, demonstrating strong realism, viewpoint alignment, and interactivity. This work advances VR content creation by leveraging sphere-aware representations and frequency-domain supervision to produce immersive skylines, buildings, and landscapes from limited input views. Practical impact includes enabling personalized, VR-ready panorama generation for virtual tourism and related applications.

Abstract

360 images, with a field-of-view (FoV) of 180x360, provide immersive and realistic environments for emerging virtual reality (VR) applications, such as virtual tourism, where users desire to create diverse panoramic scenes from a narrow FoV photo they take from a viewpoint via portable devices. It thus brings us to a technical challenge: `How to allow the users to freely create diverse and immersive virtual scenes from a narrow FoV image with a specified viewport?' To this end, we propose a transformer-based 360 image outpainting framework called Dream360, which can generate diverse, high-fidelity, and high-resolution panoramas from user-selected viewports, considering the spherical properties of 360 images. Compared with existing methods, e.g., [3], which primarily focus on inputs with rectangular masks and central locations while overlooking the spherical property of 360 images, our Dream360 offers higher outpainting flexibility and fidelity based on the spherical representation. Dream360 comprises two key learning stages: (I) codebook-based panorama outpainting via Spherical-VQGAN (S-VQGAN), and (II) frequency-aware refinement with a novel frequency-aware consistency loss. Specifically, S-VQGAN learns a sphere-specific codebook from spherical harmonic (SH) values, providing a better representation of spherical data distribution for scene modeling. The frequency-aware refinement matches the resolution and further improves the semantic consistency and visual fidelity of the generated results. Our Dream360 achieves significantly lower Frechet Inception Distance (FID) scores and better visual fidelity than existing methods. We also conducted a user study involving 15 participants to interactively evaluate the quality of the generated results in VR, demonstrating the flexibility and superiority of our Dream360 framework.

Dream360: Diverse and Immersive Outdoor Virtual Scene Creation via Transformer-Based 360 Image Outpainting

TL;DR

Dream360 tackles the challenge of generating diverse, high-fidelity 360° panoramas from narrow FoV viewports by introducing a two-stage transformer framework. It first learns sphere-aware codebooks via S-VQGAN using spherical harmonics, then refines outputs with a frequency-aware consistency loss to recover high-frequency details. The approach outperforms prior methods on outdoor panorama data and is validated by a VR user study with 15 participants, demonstrating strong realism, viewpoint alignment, and interactivity. This work advances VR content creation by leveraging sphere-aware representations and frequency-domain supervision to produce immersive skylines, buildings, and landscapes from limited input views. Practical impact includes enabling personalized, VR-ready panorama generation for virtual tourism and related applications.

Abstract

360 images, with a field-of-view (FoV) of 180x360, provide immersive and realistic environments for emerging virtual reality (VR) applications, such as virtual tourism, where users desire to create diverse panoramic scenes from a narrow FoV photo they take from a viewpoint via portable devices. It thus brings us to a technical challenge: `How to allow the users to freely create diverse and immersive virtual scenes from a narrow FoV image with a specified viewport?' To this end, we propose a transformer-based 360 image outpainting framework called Dream360, which can generate diverse, high-fidelity, and high-resolution panoramas from user-selected viewports, considering the spherical properties of 360 images. Compared with existing methods, e.g., [3], which primarily focus on inputs with rectangular masks and central locations while overlooking the spherical property of 360 images, our Dream360 offers higher outpainting flexibility and fidelity based on the spherical representation. Dream360 comprises two key learning stages: (I) codebook-based panorama outpainting via Spherical-VQGAN (S-VQGAN), and (II) frequency-aware refinement with a novel frequency-aware consistency loss. Specifically, S-VQGAN learns a sphere-specific codebook from spherical harmonic (SH) values, providing a better representation of spherical data distribution for scene modeling. The frequency-aware refinement matches the resolution and further improves the semantic consistency and visual fidelity of the generated results. Our Dream360 achieves significantly lower Frechet Inception Distance (FID) scores and better visual fidelity than existing methods. We also conducted a user study involving 15 participants to interactively evaluate the quality of the generated results in VR, demonstrating the flexibility and superiority of our Dream360 framework.
Paper Structure (23 sections, 8 equations, 16 figures, 5 tables)

This paper contains 23 sections, 8 equations, 16 figures, 5 tables.

Figures (16)

  • Figure 1: The patches of cubemap projection from a panorama.
  • Figure 2: Overview of Dream360, consisting of two stages: Codebook-based panorama outpainting and Frequency-aware refinement.
  • Figure 3: Real-valued spherical harmonics with degree $l=$2 and order $|m|\leq l$ corresponding to a panorama located at the spherical coordinate.
  • Figure 4: The visual comparison of panorama reconstruction performance at the resolution of $256\times512$ between VQGAN and our S-VQGAN.
  • Figure 5: Qualitative results on outdoor scenes of SUN360 dataset. In the last two columns, we show the diverse outpainting results generated by our Dream360. More qualitative examples are shown in Fig. \ref{['fig:resultsL']}.
  • ...and 11 more figures