No More Ambiguity in 360° Room Layout via Bi-Layout Estimation
Yu-Ju Tsai, Jin-Cheng Jhang, Jingjing Zheng, Wei Wang, Albert Y. C. Chen, Min Sun, Cheng-Hao Kuo, Ming-Hsuan Yang
TL;DR
This work tackles the inherent ambiguity in 360° room layout annotations by predicting two distinct layouts per image: enclosed and extended. It introduces a Bi-Layout architecture that uses two global context embeddings and a shared feature guidance module to generate both predictions efficiently, paired with a disambiguate metric for robust evaluation under ambiguous ground truth. Empirical results on MatterportLayout and ZInD demonstrate state-of-the-art performance, improved 3DIoU and notable gains on highly ambiguous subsets, as well as the ability to detect ambiguous regions. The approach offers a compact, scalable solution for multi-layout reasoning with practical implications for indoor scene understanding and downstream applications.
Abstract
Inherent ambiguity in layout annotations poses significant challenges to developing accurate 360° room layout estimation models. To address this issue, we propose a novel Bi-Layout model capable of predicting two distinct layout types. One stops at ambiguous regions, while the other extends to encompass all visible areas. Our model employs two global context embeddings, where each embedding is designed to capture specific contextual information for each layout type. With our novel feature guidance module, the image feature retrieves relevant context from these embeddings, generating layout-aware features for precise bi-layout predictions. A unique property of our Bi-Layout model is its ability to inherently detect ambiguous regions by comparing the two predictions. To circumvent the need for manual correction of ambiguous annotations during testing, we also introduce a new metric for disambiguating ground truth layouts. Our method demonstrates superior performance on benchmark datasets, notably outperforming leading approaches. Specifically, on the MatterportLayout dataset, it improves 3DIoU from 81.70% to 82.57% across the full test set and notably from 54.80% to 59.97% in subsets with significant ambiguity. Project page: https://liagm.github.io/Bi_Layout/
