Table of Contents
Fetching ...

Shape2.5D: A Dataset of Texture-less Surfaces for Depth and Normals Estimation

Muhammad Saif Ullah Khan, Sankalp Sinha, Didier Stricker, Marcus Liwicki, Muhammad Zeshan Afzal

TL;DR

This work introduces “Shape2.5D,” a novel, large-scale dataset that provides depth and surface normal maps for texture-less object reconstruction and demonstrates its ability to support the development of algorithms that robustly estimate depth and normals from RGB images and perform voxel reconstruction.

Abstract

Reconstructing texture-less surfaces poses unique challenges in computer vision, primarily due to the lack of specialized datasets that cater to the nuanced needs of depth and normals estimation in the absence of textural information. We introduce "Shape2.5D," a novel, large-scale dataset designed to address this gap. Comprising 1.17 million frames spanning over 39,772 3D models and 48 unique objects, our dataset provides depth and surface normal maps for texture-less object reconstruction. The proposed dataset includes synthetic images rendered with 3D modeling software to simulate various lighting conditions and viewing angles. It also includes a real-world subset comprising 4,672 frames captured with a depth camera. Our comprehensive benchmarks demonstrate the dataset's ability to support the development of algorithms that robustly estimate depth and normals from RGB images and perform voxel reconstruction. Our open-source data generation pipeline allows the dataset to be extended and adapted for future research. The dataset is publicly available at https://github.com/saifkhichi96/Shape25D.

Shape2.5D: A Dataset of Texture-less Surfaces for Depth and Normals Estimation

TL;DR

This work introduces “Shape2.5D,” a novel, large-scale dataset that provides depth and surface normal maps for texture-less object reconstruction and demonstrates its ability to support the development of algorithms that robustly estimate depth and normals from RGB images and perform voxel reconstruction.

Abstract

Reconstructing texture-less surfaces poses unique challenges in computer vision, primarily due to the lack of specialized datasets that cater to the nuanced needs of depth and normals estimation in the absence of textural information. We introduce "Shape2.5D," a novel, large-scale dataset designed to address this gap. Comprising 1.17 million frames spanning over 39,772 3D models and 48 unique objects, our dataset provides depth and surface normal maps for texture-less object reconstruction. The proposed dataset includes synthetic images rendered with 3D modeling software to simulate various lighting conditions and viewing angles. It also includes a real-world subset comprising 4,672 frames captured with a depth camera. Our comprehensive benchmarks demonstrate the dataset's ability to support the development of algorithms that robustly estimate depth and normals from RGB images and perform voxel reconstruction. Our open-source data generation pipeline allows the dataset to be extended and adapted for future research. The dataset is publicly available at https://github.com/saifkhichi96/Shape25D.
Paper Structure (19 sections, 4 figures, 12 tables)

This paper contains 19 sections, 4 figures, 12 tables.

Figures (4)

  • Figure 1: Depth Complexity: (a) A simple rubber duck, (b) the detailed San Diego Convention Center, and (c) an ornate Thai statue.
  • Figure 2: HDRI Environments. We use 579 real-world backgrounds, including indoor and outdoor scenes during the day and night.
  • Figure 3: Synthetic (B) Rendering Configurations. We render over 39k 3D models from ShapeNet from different angles. The azimuth angles range from 0 to 360 degrees, while the elevation angles are between -45 and 45 degrees (i.e., 0-45 and 315-360). The in-plane rotation and camera field of view are always fixed at 0 and 25 degrees, respectively. The scale of the shapes is varied by adjusting the camera distance in the range of 1 and 3.5.
  • Figure 4: Category-wise IoU for the val and test sets. Comparison of 3D-RETR, Pix2Vox, and Pix2Vox++ using various numbers of input images on the validation and test sets. Best viewed in color and zoomed-in on a screen.