Table of Contents
Fetching ...

Empirical Comparison of Four Stereoscopic Depth Sensing Cameras for Robotics Applications

Lukas Rustler, Vojtech Volprecht, Matej Hoffmann

TL;DR

This work provides a comprehensive, multi-metric benchmark of four stereoscopic RGB-D cameras (D435, D455, ZED 2, OAK-D Pro) across planar-surface and object-perception tasks, using three scenarios and ground-truth references. The authors introduce a robust experimental pipeline with six metrics (bias, standard deviation, Chamfer Distance, Jaccard Similarity, F-score, angle between normals) and publish over 12,000 RGB-D frames for community use. Key findings show that ZED 2 offers the strongest performance at longer distances but requires a CUDA-enabled GPU, while the D435 excels at close-range planar perception; OAK-D Pro provides onboard AI but struggles with more complex shapes, and D455 provides balanced performance at mid-to-long ranges. Overall, the results guide sensor selection by distance, scene geometry, and hardware constraints, and the accompanying public dataset enables ongoing evaluation of RGB-D perception pipelines.

Abstract

Depth sensing is an essential technology in robotics and many other fields. Many depth sensing (or RGB-D) cameras are available on the market and selecting the best one for your application can be challenging. In this work, we tested four stereoscopic RGB-D cameras that sense the distance by using two images from slightly different views. We empirically compared four cameras (Intel RealSense D435, Intel RealSense D455, StereoLabs ZED 2, and Luxonis OAK-D Pro) in three scenarios: (i) planar surface perception, (ii) plastic doll perception, (iii) household object perception (YCB dataset). We recorded and evaluated more than 3,000 RGB-D frames for each camera. For table-top robotics scenarios with distance to objects up to one meter, the best performance is provided by the D435 camera that is able to perceive with an error under 1 cm in all of the tested scenarios. For longer distances, the other three models perform better, making them more suitable for some mobile robotics applications. OAK-D Pro additionally offers integrated AI modules (e.g., object and human keypoint detection). ZED 2 is overall the best camera which is able to keep the error under 3 cm even at 4 meters. However, it is not a standalone device and requires a computer with a GPU for depth data acquisition. All data (more than 12,000 RGB-D frames) are publicly available at https://rustlluk.github.io/rgbd-comparison.

Empirical Comparison of Four Stereoscopic Depth Sensing Cameras for Robotics Applications

TL;DR

This work provides a comprehensive, multi-metric benchmark of four stereoscopic RGB-D cameras (D435, D455, ZED 2, OAK-D Pro) across planar-surface and object-perception tasks, using three scenarios and ground-truth references. The authors introduce a robust experimental pipeline with six metrics (bias, standard deviation, Chamfer Distance, Jaccard Similarity, F-score, angle between normals) and publish over 12,000 RGB-D frames for community use. Key findings show that ZED 2 offers the strongest performance at longer distances but requires a CUDA-enabled GPU, while the D435 excels at close-range planar perception; OAK-D Pro provides onboard AI but struggles with more complex shapes, and D455 provides balanced performance at mid-to-long ranges. Overall, the results guide sensor selection by distance, scene geometry, and hardware constraints, and the accompanying public dataset enables ongoing evaluation of RGB-D perception pipelines.

Abstract

Depth sensing is an essential technology in robotics and many other fields. Many depth sensing (or RGB-D) cameras are available on the market and selecting the best one for your application can be challenging. In this work, we tested four stereoscopic RGB-D cameras that sense the distance by using two images from slightly different views. We empirically compared four cameras (Intel RealSense D435, Intel RealSense D455, StereoLabs ZED 2, and Luxonis OAK-D Pro) in three scenarios: (i) planar surface perception, (ii) plastic doll perception, (iii) household object perception (YCB dataset). We recorded and evaluated more than 3,000 RGB-D frames for each camera. For table-top robotics scenarios with distance to objects up to one meter, the best performance is provided by the D435 camera that is able to perceive with an error under 1 cm in all of the tested scenarios. For longer distances, the other three models perform better, making them more suitable for some mobile robotics applications. OAK-D Pro additionally offers integrated AI modules (e.g., object and human keypoint detection). ZED 2 is overall the best camera which is able to keep the error under 3 cm even at 4 meters. However, it is not a standalone device and requires a computer with a GPU for depth data acquisition. All data (more than 12,000 RGB-D frames) are publicly available at https://rustlluk.github.io/rgbd-comparison.
Paper Structure (16 sections, 7 equations, 18 figures, 3 tables)

This paper contains 16 sections, 7 equations, 18 figures, 3 tables.

Figures (18)

  • Figure 1: RGB-D cameras used in the experiments. The cameras are shown in scale---width of D435 camera in a) is 90 mm.
  • Figure 2: Experimental setup illustration -- plastic doll perception.
  • Figure 3: Objects from the YCB dataset used in the experiments.
  • Figure 4: Normals of the ground-truth (blue) and captured (red) point cloud of the plastic doll.
  • Figure 5: Planar surface perception -- bias and standard deviation. The values for each distance and camera are averaged from 30 frames. (Top) Bias in distance estimation to the plane (0 bias is correct). (Bottom) -- Standard deviation of the estimates.
  • ...and 13 more figures