uLayout: Unified Room Layout Estimation for Perspective and Panoramic Images
Jonathan Lee, Bolivar Solarte, Chin-Hsuan Wu, Jin-Cheng Jhang, Fu-En Wang, Yi-Hsuan Tsai, Min Sun
TL;DR
uLayout introduces a unified, end-to-end model for room layout estimation that handles both perspective and panoramic images by projecting inputs into a shared equirectangular space and aligning perspective horizons via a vertical-shift allocation. It employs a dual-branch, shared ResNet-50 feature extractor with domain-specific 1D convolutions and a SWG-Transformer to capture local and global geometry, followed by a joint loss that combines image-domain boundaries and horizon-depth terms. Joint training on panoramic and perspective data yields competitive results across standard benchmarks and notably improves perspective-boundary accuracy when paired with LSUN data, while significantly reducing computation through efficient feature extraction. The approach bridges modality gaps, enables robust cross-domain generalization, and delivers practical benefits for real-world room-layout tasks. Code availability further supports reproducibility and adaptation in downstream applications.
Abstract
We present uLayout, a unified model for estimating room layout geometries from both perspective and panoramic images, whereas traditional solutions require different model designs for each image type. The key idea of our solution is to unify both domains into the equirectangular projection, particularly, allocating perspective images into the most suitable latitude coordinate to effectively exploit both domains seamlessly. To address the Field-of-View (FoV) difference between the input domains, we design uLayout with a shared feature extractor with an extra 1D-Convolution layer to condition each domain input differently. This conditioning allows us to efficiently formulate a column-wise feature regression problem regardless of the FoV input. This simple yet effective approach achieves competitive performance with current state-of-the-art solutions and shows for the first time a single end-to-end model for both domains. Extensive experiments in the real-world datasets, LSUN, Matterport3D, PanoContext, and Stanford 2D-3D evidence the contribution of our approach. Code is available at https://github.com/JonathanLee112/uLayout.
