Table of Contents
Fetching ...

Correlation of Object Detection Performance with Visual Saliency and Depth Estimation

Matthias Bartolo, Dylan Seychell

TL;DR

Investigating the correlations between object detection accuracy and two fundamental visual tasks: depth prediction and visual saliency prediction finds that visual saliency shows consistently stronger correlations with object detection accuracy compared to depth prediction.

Abstract

As object detection techniques continue to evolve, understanding their relationships with complementary visual tasks becomes crucial for optimising model architectures and computational resources. This paper investigates the correlations between object detection accuracy and two fundamental visual tasks: depth prediction and visual saliency prediction. Through comprehensive experiments using state-of-the-art models (DeepGaze IIE, Depth Anything, DPT-Large, and Itti's model) on COCO and Pascal VOC datasets, we find that visual saliency shows consistently stronger correlations with object detection accuracy (mA$ρ$ up to 0.459 on Pascal VOC) compared to depth prediction (mA$ρ$ up to 0.283). Our analysis reveals significant variations in these correlations across object categories, with larger objects showing correlation values up to three times higher than smaller objects. These findings suggest incorporating visual saliency features into object detection architectures could be more beneficial than depth information, particularly for specific object categories. The observed category-specific variations also provide insights for targeted feature engineering and dataset design improvements, potentially leading to more efficient and accurate object detection systems.

Correlation of Object Detection Performance with Visual Saliency and Depth Estimation

TL;DR

Investigating the correlations between object detection accuracy and two fundamental visual tasks: depth prediction and visual saliency prediction finds that visual saliency shows consistently stronger correlations with object detection accuracy compared to depth prediction.

Abstract

As object detection techniques continue to evolve, understanding their relationships with complementary visual tasks becomes crucial for optimising model architectures and computational resources. This paper investigates the correlations between object detection accuracy and two fundamental visual tasks: depth prediction and visual saliency prediction. Through comprehensive experiments using state-of-the-art models (DeepGaze IIE, Depth Anything, DPT-Large, and Itti's model) on COCO and Pascal VOC datasets, we find that visual saliency shows consistently stronger correlations with object detection accuracy (mA up to 0.459 on Pascal VOC) compared to depth prediction (mA up to 0.283). Our analysis reveals significant variations in these correlations across object categories, with larger objects showing correlation values up to three times higher than smaller objects. These findings suggest incorporating visual saliency features into object detection architectures could be more beneficial than depth information, particularly for specific object categories. The observed category-specific variations also provide insights for targeted feature engineering and dataset design improvements, potentially leading to more efficient and accurate object detection systems.

Paper Structure

This paper contains 19 sections, 2 equations, 3 figures, 2 tables, 1 algorithm.

Figures (3)

  • Figure 1: Comparison of outputs generated from various saliency and depth prediction models alongside the original image and annotations.
  • Figure 2: Sample images from the COCO dataset along with their corresponding ground truth masks, depth maps generated by the Depth Anything Model, and Pearson correlation values.
  • Figure 3: Sample images from the Pascal VOC dataset along with their corresponding ground truth masks, saliency maps generated by the DeepGaze IIE Model, and Pearson correlation values.