Table of Contents
Fetching ...

A Survey on Text-Driven 360-Degree Panorama Generation

Hai Wang, Xiaoyu Xiang, Weihao Xia, Jing-Hao Xue

TL;DR

This survey synthesizes the growing field of text-driven 360-degree panorama generation, detailing core representations (ERP, CMP, MPP), datasets, and evaluation metrics, and systematically organizing state-of-the-art methods into Text-Only generation and NFoV outpainting. It compares training-based and training-free approaches, analyzes design trade-offs in representation, generation framework, and model adaptation, and presents quantitative and qualitative benchmarks. The paper also identifies two closely related directions—text-driven 360-degree 3D scene generation and panoramic video generation—and outlines remaining challenges, including evaluation, resolution, multi-modal control, and ethical considerations, offering a roadmap for future research. By highlighting industry relevance and practical constraints, it provides a foundation for advancing production-ready panorama synthesis in VR/AR, gaming, and virtual tours.

Abstract

The advent of text-driven 360-degree panorama generation, enabling the synthesis of 360-degree panoramic images directly from textual descriptions, marks a transformative advancement in immersive visual content creation. This innovation significantly simplifies the traditionally complex process of producing such content. Recent progress in text-to-image diffusion models has accelerated the rapid development in this emerging field. This survey presents a comprehensive review of text-driven 360-degree panorama generation, offering an in-depth analysis of state-of-the-art algorithms. We extend our analysis to two closely related domains: text-driven 360-degree 3D scene generation and text-driven 360-degree panoramic video generation. Furthermore, we critically examine current limitations and propose promising directions for future research. A curated project page with relevant resources and research papers is available at https://littlewhitesea.github.io/Text-Driven-Pano-Gen/.

A Survey on Text-Driven 360-Degree Panorama Generation

TL;DR

This survey synthesizes the growing field of text-driven 360-degree panorama generation, detailing core representations (ERP, CMP, MPP), datasets, and evaluation metrics, and systematically organizing state-of-the-art methods into Text-Only generation and NFoV outpainting. It compares training-based and training-free approaches, analyzes design trade-offs in representation, generation framework, and model adaptation, and presents quantitative and qualitative benchmarks. The paper also identifies two closely related directions—text-driven 360-degree 3D scene generation and panoramic video generation—and outlines remaining challenges, including evaluation, resolution, multi-modal control, and ethical considerations, offering a roadmap for future research. By highlighting industry relevance and practical constraints, it provides a foundation for advancing production-ready panorama synthesis in VR/AR, gaming, and virtual tours.

Abstract

The advent of text-driven 360-degree panorama generation, enabling the synthesis of 360-degree panoramic images directly from textual descriptions, marks a transformative advancement in immersive visual content creation. This innovation significantly simplifies the traditionally complex process of producing such content. Recent progress in text-to-image diffusion models has accelerated the rapid development in this emerging field. This survey presents a comprehensive review of text-driven 360-degree panorama generation, offering an in-depth analysis of state-of-the-art algorithms. We extend our analysis to two closely related domains: text-driven 360-degree 3D scene generation and text-driven 360-degree panoramic video generation. Furthermore, we critically examine current limitations and propose promising directions for future research. A curated project page with relevant resources and research papers is available at https://littlewhitesea.github.io/Text-Driven-Pano-Gen/.

Paper Structure

This paper contains 36 sections, 8 figures, 6 tables.

Figures (8)

  • Figure 1: Visual comparison between a 360-degree panoramic image and a conventional 2D image.
  • Figure 2: A systematic taxonomy proposed in this survey of text-driven 360-degree panorama generation methods. Methods marked with $*$ support multiple input modalities and therefore appear in more than one branch.
  • Figure 3: Chronological overview of text-driven 360-degree panorama generation approaches. Methods in lime, orange, violet, and cyan are from \ref{['subsubsec:training-based']}, \ref{['subsubsec:training-free']}, \ref{['subsubsec:nar-based']}, and \ref{['subsubsec:ar-based']}, respectively.
  • Figure 4: Visual comparison between the equirectangular and cubemap projections of spherical images (360-degree panoramic images).
  • Figure 5: Paradigms for Text-Driven 360-Degree Panorama Generation. (a) Text-Only Generation synthesizes 360-degree panoramas from textual descriptions only. (b) Text-Driven NFoV Outpainting uses prompts and initial NFoV images as input to generate 360-degree panoramic images.
  • ...and 3 more figures