Table of Contents
Fetching ...

Hallucinating 360°: Panoramic Street-View Generation via Local Scenes Diffusion and Probabilistic Prompting

Fei Teng, Kai Luo, Sheng Wu, Siyu Li, Pujun Guo, Jiale Wei, Jiaming Zhang, Kunyu Peng, Kailun Yang

TL;DR

This paper proposes the first panoramic generation method Percep360, which enables coherent generation of panoramic data with control signals based on the stitched panoramic data, and proposes a Probabilistic Prompting Method (PPM), enabling controllable panoramic image generation.

Abstract

Panoramic perception holds significant potential for autonomous driving, enabling vehicles to acquire a comprehensive 360° surround view in a single shot. However, autonomous driving is a data-driven task. Complete panoramic data acquisition requires complex sampling systems and annotation pipelines, which are time-consuming and labor-intensive. Although existing street view generation models have demonstrated strong data regeneration capabilities, they can only learn from the fixed data distribution of existing datasets and cannot leverage stitched pinhole images as a supervisory signal. In this paper, we propose the first panoramic generation method Percep360 for autonomous driving. Percep360 enables coherent generation of panoramic data with control signals based on the stitched panoramic data. Percep360 focuses on two key aspects: coherence and controllability. Specifically, to overcome the inherent information loss caused by the pinhole sampling process, we propose the Local Scenes Diffusion Method (LSDM). LSDM reformulates the panorama generation as a spatially continuous diffusion process, bridging the gaps between different data distributions. Additionally, to achieve the controllable generation of panoramic images, we propose a Probabilistic Prompting Method (PPM). PPM dynamically selects the most relevant control cues, enabling controllable panoramic image generation. We evaluate the effectiveness of the generated images from three perspectives: image quality assessment (i.e., no-reference and with reference), controllability, and their utility in real-world Bird's Eye View (BEV) segmentation. Notably, the generated data consistently outperforms the original stitched images in no-reference quality metrics and enhances downstream perception models. The source code will be publicly available at https://github.com/FeiT-FeiTeng/Percep360.

Hallucinating 360°: Panoramic Street-View Generation via Local Scenes Diffusion and Probabilistic Prompting

TL;DR

This paper proposes the first panoramic generation method Percep360, which enables coherent generation of panoramic data with control signals based on the stitched panoramic data, and proposes a Probabilistic Prompting Method (PPM), enabling controllable panoramic image generation.

Abstract

Panoramic perception holds significant potential for autonomous driving, enabling vehicles to acquire a comprehensive 360° surround view in a single shot. However, autonomous driving is a data-driven task. Complete panoramic data acquisition requires complex sampling systems and annotation pipelines, which are time-consuming and labor-intensive. Although existing street view generation models have demonstrated strong data regeneration capabilities, they can only learn from the fixed data distribution of existing datasets and cannot leverage stitched pinhole images as a supervisory signal. In this paper, we propose the first panoramic generation method Percep360 for autonomous driving. Percep360 enables coherent generation of panoramic data with control signals based on the stitched panoramic data. Percep360 focuses on two key aspects: coherence and controllability. Specifically, to overcome the inherent information loss caused by the pinhole sampling process, we propose the Local Scenes Diffusion Method (LSDM). LSDM reformulates the panorama generation as a spatially continuous diffusion process, bridging the gaps between different data distributions. Additionally, to achieve the controllable generation of panoramic images, we propose a Probabilistic Prompting Method (PPM). PPM dynamically selects the most relevant control cues, enabling controllable panoramic image generation. We evaluate the effectiveness of the generated images from three perspectives: image quality assessment (i.e., no-reference and with reference), controllability, and their utility in real-world Bird's Eye View (BEV) segmentation. Notably, the generated data consistently outperforms the original stitched images in no-reference quality metrics and enhances downstream perception models. The source code will be publicly available at https://github.com/FeiT-FeiTeng/Percep360.

Paper Structure

This paper contains 13 sections, 16 equations, 5 figures, 4 tables.

Figures (5)

  • Figure 1: Percep360 leverages diverse prompts to generate coherent panoramic images. The generated images exhibit improved quality and controllability. As a data generator, Percep360 synthesizes diverse data to avoid the expensive process of data processing, thereby enhancing the performance of semantic mapping.
  • Figure 2: Visualizations of the pinhole samples, stitched image, ground truth for training generation model, and the results of the existing approach are presented. Owing to the inherent information loss and misalignment in pinhole camera sampling, stitching-based methods struggle to produce coherent panoramic images. Moreover, when these stitched images are used as inputs, existing generation models often inherit the stitching errors. In contrast, the proposed LSDM approach enables the generation of images with improved coherence, particularly at stitching boundaries.
  • Figure 3: The overall architecture of Percep360 leverages the BEV map, textual prompt, depth map, and mask map jointly as guidance signals. With the integration of the LSDM and PPM modules, the framework achieves coherent and controllable panoramic generation.
  • Figure 4: The architecture of LSDM and PPM is illustrated in Figures (a) and (b), respectively. By applying circular rotations to diverse prompts and BEV features, LSDM reformulates panoramic image generation as a spatially continuous diffusion process. PPM further enhances controllability by dynamically selecting the most relevant control cues, leading to an accurate and consistent panoramic image.
  • Figure 5: Visualizations of the generation results for the ground truth, BEVControl Bevcontrol, and MagicDrive MD are provided. Regions of stitching misalignment are highlighted with yellow boxes. It can be observed that, due to the lack of coherent generation capability, existing methods produce discontinuous results. In contrast, our method achieves improved coherence without compromising controllability.