Table of Contents
Fetching ...

InSpaceType: Reconsider Space Type in Indoor Monocular Depth Estimation

Cho-Ying Wu, Quankai Gao, Chin-Cheng Hsu, Te-Lin Wu, Jing-Wen Chen, Ulrich Neumann

TL;DR

This work addresses the robustness and generalization of indoor monocular depth estimation across diverse space types, an issue neglected by NYUv2-centric benchmarks. It introduces InSpaceType, a high-resolution RGBD dataset with a hierarchical space-type taxonomy, and benchmarks 12 leading methods to reveal pronounced performance imbalances across space types. The authors analyze how training data biases and simple mitigation strategies affect cross-type performance and demonstrate that generalization to unseen space types remains challenging. They advocate for space-type-aware evaluation to improve robustness and safety in real-world deployments, and show that training on InSpaceType can improve zero-shot generalization to other datasets, albeit with limitations still to be addressed.

Abstract

Indoor monocular depth estimation has attracted increasing research interest. Most previous works have been focusing on methodology, primarily experimenting with NYU-Depth-V2 (NYUv2) Dataset, and only concentrated on the overall performance over the test set. However, little is known regarding robustness and generalization when it comes to applying monocular depth estimation methods to real-world scenarios where highly varying and diverse functional \textit{space types} are present such as library or kitchen. A study for performance breakdown into space types is essential to realize a pretrained model's performance variance. To facilitate our investigation for robustness and address limitations of previous works, we collect InSpaceType, a high-quality and high-resolution RGBD dataset for general indoor environments. We benchmark 12 recent methods on InSpaceType and find they severely suffer from performance imbalance concerning space types, which reveals their underlying bias. We extend our analysis to 4 other datasets, 3 mitigation approaches, and the ability to generalize to unseen space types. Our work marks the first in-depth investigation of performance imbalance across space types for indoor monocular depth estimation, drawing attention to potential safety concerns for model deployment without considering space types, and further shedding light on potential ways to improve robustness. See \url{https://depthcomputation.github.io/DepthPublic} for data and the supplementary document. The benchmark list on the GitHub project page keeps updates for the lastest monocular depth estimation methods.

InSpaceType: Reconsider Space Type in Indoor Monocular Depth Estimation

TL;DR

This work addresses the robustness and generalization of indoor monocular depth estimation across diverse space types, an issue neglected by NYUv2-centric benchmarks. It introduces InSpaceType, a high-resolution RGBD dataset with a hierarchical space-type taxonomy, and benchmarks 12 leading methods to reveal pronounced performance imbalances across space types. The authors analyze how training data biases and simple mitigation strategies affect cross-type performance and demonstrate that generalization to unseen space types remains challenging. They advocate for space-type-aware evaluation to improve robustness and safety in real-world deployments, and show that training on InSpaceType can improve zero-shot generalization to other datasets, albeit with limitations still to be addressed.

Abstract

Indoor monocular depth estimation has attracted increasing research interest. Most previous works have been focusing on methodology, primarily experimenting with NYU-Depth-V2 (NYUv2) Dataset, and only concentrated on the overall performance over the test set. However, little is known regarding robustness and generalization when it comes to applying monocular depth estimation methods to real-world scenarios where highly varying and diverse functional \textit{space types} are present such as library or kitchen. A study for performance breakdown into space types is essential to realize a pretrained model's performance variance. To facilitate our investigation for robustness and address limitations of previous works, we collect InSpaceType, a high-quality and high-resolution RGBD dataset for general indoor environments. We benchmark 12 recent methods on InSpaceType and find they severely suffer from performance imbalance concerning space types, which reveals their underlying bias. We extend our analysis to 4 other datasets, 3 mitigation approaches, and the ability to generalize to unseen space types. Our work marks the first in-depth investigation of performance imbalance across space types for indoor monocular depth estimation, drawing attention to potential safety concerns for model deployment without considering space types, and further shedding light on potential ways to improve robustness. See \url{https://depthcomputation.github.io/DepthPublic} for data and the supplementary document. The benchmark list on the GitHub project page keeps updates for the lastest monocular depth estimation methods.
Paper Structure (8 sections, 4 figures, 8 tables)

This paper contains 8 sections, 4 figures, 8 tables.

Figures (4)

  • Figure 1: Data samples of our InSpaceType Dataset.
  • Figure 2: Statistics for InSpaceType evaluation set.
  • Figure 3: Regions that trained on NYUv2 only cannot show. InSpaceType contains several object arrangements that NYUv2 does not include, such as wall-hanging air-conditioner and phone are mostly exclusive to Asian styles rooms; a tilted viewing direction for pitch angle is shown in (B), where training on NYUv2 only cannot give robust results because NYUv2 has minor viewing pitch angle changes. DPT in their setting trains on 10 different dataset + NYUv2 (mixed-set training) attains more pleasant results. To verify generalizability, InSpaceType serves as a testbed to help find out cases where training on popular NYUv2 only cannot show.
  • Figure 4: Visualization of cross-group generalization.