OmniX: From Unified Panoramic Generation and Perception to Graphics-Ready 3D Scenes
Yukun Huang, Jiwen Yu, Yanning Zhou, Jianan Wang, Xintao Wang, Pengfei Wan, Xihui Liu
TL;DR
OmniX addresses the challenge of building graphics-ready 3D scenes from panoramas by reusing pre-trained 2D flow-matching priors for unified panoramic generation, perception, and completion. It introduces a cross-modal Separate-Adapter design and the PanoX synthetic panorama dataset to enable RGB-to-X panoramas and intrinsic decomposition, culminating in a pipeline that converts distance maps into PBR-ready 3D assets for rendering, relighting, and dynamics. The main contributions are (1) OmniX with a unified formulation and adapter architecture, (2) the PanoX dataset with dense geometry and material maps, and (3) demonstrated panoramic perception and graphics-ready 3D scene generation across multiple tasks, validated on diverse datasets. The work enables immersive, photorealistic virtual environments and outlines practical integration with graphics pipelines, while acknowledging limitations in speed, surface accuracy for distance and metallic prediction, and generalization in some material channels, suggesting avenues for future improvement.
Abstract
There are two prevalent ways to constructing 3D scenes: procedural generation and 2D lifting. Among them, panorama-based 2D lifting has emerged as a promising technique, leveraging powerful 2D generative priors to produce immersive, realistic, and diverse 3D environments. In this work, we advance this technique to generate graphics-ready 3D scenes suitable for physically based rendering (PBR), relighting, and simulation. Our key insight is to repurpose 2D generative models for panoramic perception of geometry, textures, and PBR materials. Unlike existing 2D lifting approaches that emphasize appearance generation and ignore the perception of intrinsic properties, we present OmniX, a versatile and unified framework. Based on a lightweight and efficient cross-modal adapter structure, OmniX reuses 2D generative priors for a broad range of panoramic vision tasks, including panoramic perception, generation, and completion. Furthermore, we construct a large-scale synthetic panorama dataset containing high-quality multimodal panoramas from diverse indoor and outdoor scenes. Extensive experiments demonstrate the effectiveness of our model in panoramic visual perception and graphics-ready 3D scene generation, opening new possibilities for immersive and physically realistic virtual world generation.
