Table of Contents
Fetching ...

HORIZON: High-Resolution Semantically Controlled Panorama Synthesis

Kun Yan, Lei Ji, Chenfei Wu, Jian Liang, Ming Zhou, Nan Duan, Shuai Ma

TL;DR

HORIZON addresses semantically controllable high resolution panorama synthesis by introducing a two stage learning framework and a Spherical Parallel Modeling approach that combines spherical relative embeddings and conditioning to enable efficient parallel decoding while preserving spherical coherence. It integrates both image and text semantic guidance via CLIP to control content, and employs a two pass scheme to enforce left right continuity across the panorama boundaries. Across StreetLearn and related data, the method achieves state of the art results in panorama generation, view extrapolation, and guided generation, with notable improvements in edge continuity and inference efficiency. The work enables practical VR and AR panorama creation with controllable, faithful, high resolution outputs.

Abstract

Panorama synthesis endeavors to craft captivating 360-degree visual landscapes, immersing users in the heart of virtual worlds. Nevertheless, contemporary panoramic synthesis techniques grapple with the challenge of semantically guiding the content generation process. Although recent breakthroughs in visual synthesis have unlocked the potential for semantic control in 2D flat images, a direct application of these methods to panorama synthesis yields distorted content. In this study, we unveil an innovative framework for generating high-resolution panoramas, adeptly addressing the issues of spherical distortion and edge discontinuity through sophisticated spherical modeling. Our pioneering approach empowers users with semantic control, harnessing both image and text inputs, while concurrently streamlining the generation of high-resolution panoramas using parallel decoding. We rigorously evaluate our methodology on a diverse array of indoor and outdoor datasets, establishing its superiority over recent related work, in terms of both quantitative and qualitative performance metrics. Our research elevates the controllability, efficiency, and fidelity of panorama synthesis to new levels.

HORIZON: High-Resolution Semantically Controlled Panorama Synthesis

TL;DR

HORIZON addresses semantically controllable high resolution panorama synthesis by introducing a two stage learning framework and a Spherical Parallel Modeling approach that combines spherical relative embeddings and conditioning to enable efficient parallel decoding while preserving spherical coherence. It integrates both image and text semantic guidance via CLIP to control content, and employs a two pass scheme to enforce left right continuity across the panorama boundaries. Across StreetLearn and related data, the method achieves state of the art results in panorama generation, view extrapolation, and guided generation, with notable improvements in edge continuity and inference efficiency. The work enables practical VR and AR panorama creation with controllable, faithful, high resolution outputs.

Abstract

Panorama synthesis endeavors to craft captivating 360-degree visual landscapes, immersing users in the heart of virtual worlds. Nevertheless, contemporary panoramic synthesis techniques grapple with the challenge of semantically guiding the content generation process. Although recent breakthroughs in visual synthesis have unlocked the potential for semantic control in 2D flat images, a direct application of these methods to panorama synthesis yields distorted content. In this study, we unveil an innovative framework for generating high-resolution panoramas, adeptly addressing the issues of spherical distortion and edge discontinuity through sophisticated spherical modeling. Our pioneering approach empowers users with semantic control, harnessing both image and text inputs, while concurrently streamlining the generation of high-resolution panoramas using parallel decoding. We rigorously evaluate our methodology on a diverse array of indoor and outdoor datasets, establishing its superiority over recent related work, in terms of both quantitative and qualitative performance metrics. Our research elevates the controllability, efficiency, and fidelity of panorama synthesis to new levels.
Paper Structure (38 sections, 9 equations, 18 figures, 6 tables)

This paper contains 38 sections, 9 equations, 18 figures, 6 tables.

Figures (18)

  • Figure 1: HORIZON supports multitype panorama synthesis
  • Figure 2: Modeling Strategy: we progressively improve modeling strategy from ARM to LPM and eventually SPM, achieving both high efficiency and high fidelity. In this Figure, the red boxes show the current view patch to be generated, while the blue boxes present the condition window.
  • Figure 3: Generated examples. The first row presents the ground-truth and generated images. The second to the fourth rows are the four snapshots of rendered results of each methods respectively by using the 360 panorama viewers.
  • Figure 4: Discontinuity v.s. Continuity. The three randomly selected cases present results from baseline and our models. The top images are the generated images of the baseline(LPM) method and the bottom examples are from our model(SPM). There is an obvious split line in the middle of each image on the top examples while the boundary is smooth on the bottom examples.
  • Figure 5: View Extrapolation. The first column gives the input samples, the second column presents the ground truth examples, and the third column demonstrates the generated panorama.
  • ...and 13 more figures