Self-training Room Layout Estimation via Geometry-aware Ray-casting
Bolivar Solarte, Chin-Hsuan Wu, Jin-Cheng Jhang, Jonathan Lee, Yi-Hsuan Tsai, Min Sun
TL;DR
This work tackles unsupervised adaptation for room layout estimation by introducing a geometry-aware self-training framework that uses a ray-casting data aggregation to produce pseudo-labels from multiple noisy estimates. The method defines a multi-view consistency objective and a multi-cycle ray-casting procedure to handle occlusions, paired with a Weighted Distance Loss that emphasizes distant geometry. Empirical results on synthetic and real datasets (e.g., HM3D-MVL, MP3D-FPE, ZInD) show substantial improvements over the prior 360-MLC approach across HorizonNet and LGTNet backbones, including challenging occlusion scenarios, approaching supervised baselines in some cases. The approach offers practical impact by enabling robust room-layout learning from unlabeled panoramic data and provides a pathway toward scalable, annotation-free 3D room understanding in diverse environments.
Abstract
In this paper, we introduce a novel geometry-aware self-training framework for room layout estimation models on unseen scenes with unlabeled data. Our approach utilizes a ray-casting formulation to aggregate multiple estimates from different viewing positions, enabling the computation of reliable pseudo-labels for self-training. In particular, our ray-casting approach enforces multi-view consistency along all ray directions and prioritizes spatial proximity to the camera view for geometry reasoning. As a result, our geometry-aware pseudo-labels effectively handle complex room geometries and occluded walls without relying on assumptions such as Manhattan World or planar room walls. Evaluation on publicly available datasets, including synthetic and real-world scenarios, demonstrates significant improvements in current state-of-the-art layout models without using any human annotation.
