InSpaceType: Reconsider Space Type in Indoor Monocular Depth Estimation
Cho-Ying Wu, Quankai Gao, Chin-Cheng Hsu, Te-Lin Wu, Jing-Wen Chen, Ulrich Neumann
TL;DR
This work addresses the robustness and generalization of indoor monocular depth estimation across diverse space types, an issue neglected by NYUv2-centric benchmarks. It introduces InSpaceType, a high-resolution RGBD dataset with a hierarchical space-type taxonomy, and benchmarks 12 leading methods to reveal pronounced performance imbalances across space types. The authors analyze how training data biases and simple mitigation strategies affect cross-type performance and demonstrate that generalization to unseen space types remains challenging. They advocate for space-type-aware evaluation to improve robustness and safety in real-world deployments, and show that training on InSpaceType can improve zero-shot generalization to other datasets, albeit with limitations still to be addressed.
Abstract
Indoor monocular depth estimation has attracted increasing research interest. Most previous works have been focusing on methodology, primarily experimenting with NYU-Depth-V2 (NYUv2) Dataset, and only concentrated on the overall performance over the test set. However, little is known regarding robustness and generalization when it comes to applying monocular depth estimation methods to real-world scenarios where highly varying and diverse functional \textit{space types} are present such as library or kitchen. A study for performance breakdown into space types is essential to realize a pretrained model's performance variance. To facilitate our investigation for robustness and address limitations of previous works, we collect InSpaceType, a high-quality and high-resolution RGBD dataset for general indoor environments. We benchmark 12 recent methods on InSpaceType and find they severely suffer from performance imbalance concerning space types, which reveals their underlying bias. We extend our analysis to 4 other datasets, 3 mitigation approaches, and the ability to generalize to unseen space types. Our work marks the first in-depth investigation of performance imbalance across space types for indoor monocular depth estimation, drawing attention to potential safety concerns for model deployment without considering space types, and further shedding light on potential ways to improve robustness. See \url{https://depthcomputation.github.io/DepthPublic} for data and the supplementary document. The benchmark list on the GitHub project page keeps updates for the lastest monocular depth estimation methods.
