Table of Contents
Fetching ...

A Theoretical and Practical Framework for Evaluating Uncertainty Calibration in Object Detection

Pedro Conde, Rui L. Lopes, Cristiano Premebida

TL;DR

This work presents a novel theoretical and practical framework to evaluate object detection systems in the context of uncertainty calibration through a new comprehensive formulation of this concept through distinct formal definitions, and also three novel evaluation metrics derived from such theoretical foundation.

Abstract

The proliferation of Deep Neural Networks has resulted in machine learning systems becoming increasingly more present in various real-world applications. Consequently, there is a growing demand for highly reliable models in many domains, making the problem of uncertainty calibration pivotal when considering the future of deep learning. This is especially true when considering object detection systems, that are commonly present in safety-critical applications such as autonomous driving, robotics and medical diagnosis. For this reason, this work presents a novel theoretical and practical framework to evaluate object detection systems in the context of uncertainty calibration. This encompasses a new comprehensive formulation of this concept through distinct formal definitions, and also three novel evaluation metrics derived from such theoretical foundation. The robustness of the proposed uncertainty calibration metrics is shown through a series of representative experiments.

A Theoretical and Practical Framework for Evaluating Uncertainty Calibration in Object Detection

TL;DR

This work presents a novel theoretical and practical framework to evaluate object detection systems in the context of uncertainty calibration through a new comprehensive formulation of this concept through distinct formal definitions, and also three novel evaluation metrics derived from such theoretical foundation.

Abstract

The proliferation of Deep Neural Networks has resulted in machine learning systems becoming increasingly more present in various real-world applications. Consequently, there is a growing demand for highly reliable models in many domains, making the problem of uncertainty calibration pivotal when considering the future of deep learning. This is especially true when considering object detection systems, that are commonly present in safety-critical applications such as autonomous driving, robotics and medical diagnosis. For this reason, this work presents a novel theoretical and practical framework to evaluate object detection systems in the context of uncertainty calibration. This encompasses a new comprehensive formulation of this concept through distinct formal definitions, and also three novel evaluation metrics derived from such theoretical foundation. The robustness of the proposed uncertainty calibration metrics is shown through a series of representative experiments.
Paper Structure (12 sections, 13 equations, 4 figures)

This paper contains 12 sections, 13 equations, 4 figures.

Figures (4)

  • Figure 1: Evaluating QGC, SGC, EGCE and D-ECE - against mAP - using YOLOv5 models (Nano, Small, Medium, Large and Extra Large). mAP increases proportionally to the model capacity shown from left to right (highlighted as grey-text legend in (a)). The results were obtained: on the COCO dataset with a) IoU threshold of 0.5, b) averaging the results with IoU threshold values between 0.5 and 0.95, with a step of 0.05; on the PASCAL VOC dataset with c) IoU threshold of 0.5; d) averaging the results with IoU threshold values between 0.5 and 0.95, with a step of 0.05.
  • Figure 2: Evaluating QGC, SGC, EGCE and D-ECE for increasing proportions of: a) FN detections; b) FP detections with confidence scores extracted from the Uniform distribution $U[0.8,1]$; c) TP detections with confidence scores extracted from $U[0,0.2]$; d) FP detections with confidence scores extracted from $U[0,0.2]$; e) TP detections with confidence scores extracted from $U[0.8,1]$; f) TP detections with confidence scores extracted from $U[0.98,1]$.
  • Figure 3: Evaluating QGC, SGC, EGCE and D-ECE, with increasing intensity of shifts in the distribution of the test data, using Yolov5 (Small) with a) the COCO dataset and b) the PASCAL VOC dataset.
  • Figure 4: Evaluating QGC, SGC, EGCE and D-ECE, after applying histogram binning (H.B.) and TTA, compared to a vanilla (V.) approach - i.e. with no calibration strategy - using Yolov5 (Small) in the PASCAL VOC test set a) with no distribution-shift and b) with level 5 distribution-shift.

Theorems & Definitions (3)

  • Definition 1
  • Definition 2
  • Definition 3