A Survey on Text-Driven 360-Degree Panorama Generation
Hai Wang, Xiaoyu Xiang, Weihao Xia, Jing-Hao Xue
TL;DR
This survey synthesizes the growing field of text-driven 360-degree panorama generation, detailing core representations (ERP, CMP, MPP), datasets, and evaluation metrics, and systematically organizing state-of-the-art methods into Text-Only generation and NFoV outpainting. It compares training-based and training-free approaches, analyzes design trade-offs in representation, generation framework, and model adaptation, and presents quantitative and qualitative benchmarks. The paper also identifies two closely related directions—text-driven 360-degree 3D scene generation and panoramic video generation—and outlines remaining challenges, including evaluation, resolution, multi-modal control, and ethical considerations, offering a roadmap for future research. By highlighting industry relevance and practical constraints, it provides a foundation for advancing production-ready panorama synthesis in VR/AR, gaming, and virtual tours.
Abstract
The advent of text-driven 360-degree panorama generation, enabling the synthesis of 360-degree panoramic images directly from textual descriptions, marks a transformative advancement in immersive visual content creation. This innovation significantly simplifies the traditionally complex process of producing such content. Recent progress in text-to-image diffusion models has accelerated the rapid development in this emerging field. This survey presents a comprehensive review of text-driven 360-degree panorama generation, offering an in-depth analysis of state-of-the-art algorithms. We extend our analysis to two closely related domains: text-driven 360-degree 3D scene generation and text-driven 360-degree panoramic video generation. Furthermore, we critically examine current limitations and propose promising directions for future research. A curated project page with relevant resources and research papers is available at https://littlewhitesea.github.io/Text-Driven-Pano-Gen/.
