Table of Contents
Fetching ...

On the Black-box Explainability of Object Detection Models for Safe and Trustworthy Industrial Applications

Alain Andres, Aitor Martinez-Seras, Ibai Laña, Javier Del Ser

TL;DR

This work tackles the lack of robust, model-agnostic explanations for object detectors in safety-critical industrial settings. It introduces D-MFPP, a segmentation-based mask extension of MFPP, and D-Deletion, a localization-aware extension of the Deletion metric, alongside adapting D-RISE for detectors. Through experiments on two real-world robotics datasets with YOLOv8, it demonstrates that D-RISE achieves strong faithfulness via D-Deletion and that D-MFPP can provide efficient, focused explanations with fewer masks, especially for localization. The results highlight the importance of localization-aware evaluation in multi-instance scenes and offer practical guidance for deploying explainability in industrial robotics, with code released for public use.

Abstract

In the realm of human-machine interaction, artificial intelligence has become a powerful tool for accelerating data modeling tasks. Object detection methods have achieved outstanding results and are widely used in critical domains like autonomous driving and video surveillance. However, their adoption in high-risk applications, where errors may cause severe consequences, remains limited. Explainable Artificial Intelligence methods aim to address this issue, but many existing techniques are model-specific and designed for classification tasks, making them less effective for object detection and difficult for non-specialists to interpret. In this work we focus on model-agnostic explainability methods for object detection models and propose D-MFPP, an extension of the Morphological Fragmental Perturbation Pyramid (MFPP) technique based on segmentation-based masks to generate explanations. Additionally, we introduce D-Deletion, a novel metric combining faithfulness and localization, adapted specifically to meet the unique demands of object detectors. We evaluate these methods on real-world industrial and robotic datasets, examining the influence of parameters such as the number of masks, model size, and image resolution on the quality of explanations. Our experiments use single-stage object detection models applied to two safety-critical robotic environments: i) a shared human-robot workspace where safety is of paramount importance, and ii) an assembly area of battery kits, where safety is critical due to the potential for damage among high-risk components. Our findings evince that D-Deletion effectively gauges the performance of explanations when multiple elements of the same class appear in a scene, while D-MFPP provides a promising alternative to D-RISE when fewer masks are used.

On the Black-box Explainability of Object Detection Models for Safe and Trustworthy Industrial Applications

TL;DR

This work tackles the lack of robust, model-agnostic explanations for object detectors in safety-critical industrial settings. It introduces D-MFPP, a segmentation-based mask extension of MFPP, and D-Deletion, a localization-aware extension of the Deletion metric, alongside adapting D-RISE for detectors. Through experiments on two real-world robotics datasets with YOLOv8, it demonstrates that D-RISE achieves strong faithfulness via D-Deletion and that D-MFPP can provide efficient, focused explanations with fewer masks, especially for localization. The results highlight the importance of localization-aware evaluation in multi-instance scenes and offer practical guidance for deploying explainability in industrial robotics, with code released for public use.

Abstract

In the realm of human-machine interaction, artificial intelligence has become a powerful tool for accelerating data modeling tasks. Object detection methods have achieved outstanding results and are widely used in critical domains like autonomous driving and video surveillance. However, their adoption in high-risk applications, where errors may cause severe consequences, remains limited. Explainable Artificial Intelligence methods aim to address this issue, but many existing techniques are model-specific and designed for classification tasks, making them less effective for object detection and difficult for non-specialists to interpret. In this work we focus on model-agnostic explainability methods for object detection models and propose D-MFPP, an extension of the Morphological Fragmental Perturbation Pyramid (MFPP) technique based on segmentation-based masks to generate explanations. Additionally, we introduce D-Deletion, a novel metric combining faithfulness and localization, adapted specifically to meet the unique demands of object detectors. We evaluate these methods on real-world industrial and robotic datasets, examining the influence of parameters such as the number of masks, model size, and image resolution on the quality of explanations. Our experiments use single-stage object detection models applied to two safety-critical robotic environments: i) a shared human-robot workspace where safety is of paramount importance, and ii) an assembly area of battery kits, where safety is critical due to the potential for damage among high-risk components. Our findings evince that D-Deletion effectively gauges the performance of explanations when multiple elements of the same class appear in a scene, while D-MFPP provides a promising alternative to D-RISE when fewer masks are used.

Paper Structure

This paper contains 26 sections, 6 equations, 10 figures, 6 tables, 1 algorithm.

Figures (10)

  • Figure 1: Example of three masks generated using Sliding Window (top), RISE (middle), and MFPP (bottom). MFPP masks are dependent on the image at the input of the model. In this case, we consider a sample from the battery assembly dataset detailed in Section \ref{['sec:materials_and_methods']}.
  • Figure 2: Dataset 1 (Human-Robot collaboration): Data are captured from cameras located in 3 different positions. All the images belonging to this dataset contain the faces blur to preserver anonymity.
  • Figure 3: Dataset 2 (Battery Assembly kit): The setup where a robotic arm would assemble the kit based a bird-eye view of the table where all component are expected to be; (left) a theoretical setup; (right) an actual sample.
  • Figure 4: Illustration of a collaborative workspace featuring two humans and a robotic arm. The first row shows the original image. The second row displays the image with the 10% most important pixels removed for each human, as identified by an XAI method. In the third row, the Deletion metric curve, which only considers class type, shows a high probability score even when the primary human is largely occluded by the other person. The fourth row presents the D-Deletion metric curve, which incorporates a localization component, providing a more accurate measure of explanation importance by considering the positions of entities within the image. A lower area under the curve indicates a better explanation.
  • Figure 5: Heatmaps obtained by applying RISE (left) and D-RISE (right) for the detection of a human in the Human-Robot Dataset.
  • ...and 5 more figures