Challenging the Black Box: A Comprehensive Evaluation of Attribution Maps of CNN Applications in Agriculture and Forestry

Lars Nieradzik; Henrike Stephani; Jördis Sieburg-Rockel; Stephanie Helmling; Andrea Olbrich; Janis Keuper

Challenging the Black Box: A Comprehensive Evaluation of Attribution Maps of CNN Applications in Agriculture and Forestry

Lars Nieradzik, Henrike Stephani, Jördis Sieburg-Rockel, Stephanie Helmling, Andrea Olbrich, Janis Keuper

TL;DR

This study provides critical insights into the trustworthiness and practicality of AMs within the agriculture and forestry sectors, thus facilitating a better understanding of neural networks in these application areas.

Abstract

In this study, we explore the explainability of neural networks in agriculture and forestry, specifically in fertilizer treatment classification and wood identification. The opaque nature of these models, often considered 'black boxes', is addressed through an extensive evaluation of state-of-the-art Attribution Maps (AMs), also known as class activation maps (CAMs) or saliency maps. Our comprehensive qualitative and quantitative analysis of these AMs uncovers critical practical limitations. Findings reveal that AMs frequently fail to consistently highlight crucial features and often misalign with the features considered important by domain experts. These discrepancies raise substantial questions about the utility of AMs in understanding the decision-making process of neural networks. Our study provides critical insights into the trustworthiness and practicality of AMs within the agriculture and forestry sectors, thus facilitating a better understanding of neural networks in these application areas.

Challenging the Black Box: A Comprehensive Evaluation of Attribution Maps of CNN Applications in Agriculture and Forestry

TL;DR

Abstract

Paper Structure (15 sections, 4 equations, 6 figures, 2 tables)

This paper contains 15 sections, 4 equations, 6 figures, 2 tables.

Introduction
Related Work
Attribution methods
Evaluation of attribution methods
Method
Consistency
Qualitative and quantitative evaluation of saliency maps
Evaluation
Consistency and expert annotation
Wood identification dataset
Fertilizer treatment dataset
Metrics and feature sharing
Wood identification dataset
Fertilizer treatment dataset
Discussion and Outlook

Figures (6)

Figure 1: Visualization of different attribution maps (AM) on the same input image of a wood identification dataset. All the AMs focus on different regions that are also different from the expert annotation. Notably, SmoothGradCAM++ DBLP:journals/corr/abs-1908-01224 appears to exclusively show noise.
Figure 2: The "Insertion" metric evaluated on an example image illustrates the impact of the parameter "blurring" in comparison to "no blurring". In (a), the starting image is a completely blurred image. In (b), the starting image is a black image. We input this modified image into the neural network to obtain a probability (see y-axis). First, the most important pixels of the original image are inserted. Then gradually less important pixels are inserted. At each step, the network predicts the probability of this modified image. The process ends with the complete original image and the original probability. The best saliency map in the plots is determined by computing the area under the curve (AUC).
Figure 3: This plot illustrates the degree of similarity among all attribution maps. The matrices were computed by averaging the individual metric results across all attribution maps in the wood identification dataset. Both similarity measures indicate a weak agreement among the different maps.
Figure 4: Visualization of different attribution maps for the fertilizer dataset of an individual image. Similar to the maps for the wood identification dataset, they exhibit inconsistency in identifying what they consider important.
Figure 5: Attribution maps are intended to visualize the most crucial regions influencing the decision for a specific class in a given model. However, this image comparison reveals that both saliency maps highlight vastly different regions for each class, even though we aim to visualize the same model and classes. SmoothGrad tends to highlight the same regions regardless of the correct class (Euca), whereas GradCAM emphasizes distinct regions. This lack of consistency raises uncertainty about which region the model truly deems most important, making it challenging to identify easily interpretable features for humans.
...and 1 more figures

Challenging the Black Box: A Comprehensive Evaluation of Attribution Maps of CNN Applications in Agriculture and Forestry

TL;DR

Abstract

Challenging the Black Box: A Comprehensive Evaluation of Attribution Maps of CNN Applications in Agriculture and Forestry

Authors

TL;DR

Abstract

Table of Contents

Figures (6)