Table of Contents
Fetching ...

Reliable Explainability of Deep Learning Spatial-Spectral Classifiers for Improved Semantic Segmentation in Autonomous Driving

Jon Gutiérrez-Zaballa, Koldo Basterretxea, Javier Echanobe

TL;DR

The paper tackles the problem of explainability for spectral-spatial deep learning models used in semantic segmentation for autonomous driving. It argues that conventional CAM-based saliency methods fail to reliably reflect the decision process in segmentation and extends the conservativeness concept to segmentation to ensure faithful attributions. To improve explainability, it investigates activations and weights from key layers in HSI-enabled U-Net models trained on the HSIDriveV20 dataset, comparing 1-, 3-, and 25-channel inputs with per-pixel normalization. The findings show that while richer spectral information (25-channel PN) can enhance segmentation robustness for heterogeneous classes, it may weaken edge delineation, underscoring the need for segmentation-specific explainability tools and ongoing work to balance spectral richness with reliable, interpretable outputs in safety-critical driving systems.

Abstract

Integrating hyperspectral imagery (HSI) with deep neural networks (DNNs) can strengthen the accuracy of intelligent vision systems by combining spectral and spatial information, which is useful for tasks like semantic segmentation in autonomous driving. To advance research in such safety-critical systems, determining the precise contribution of spectral information to complex DNNs' output is needed. To address this, several saliency methods, such as class activation maps (CAM), have been proposed primarily for image classification. However, recent studies have raised concerns regarding their reliability. In this paper, we address their limitations and propose an alternative approach by leveraging the data provided by activations and weights from relevant DNN layers to better capture the relationship between input features and predictions. The study aims to assess the superior performance of HSI compared to 3-channel and single-channel DNNs. We also address the influence of spectral signature normalization for enhancing DNN robustness in real-world driving conditions.

Reliable Explainability of Deep Learning Spatial-Spectral Classifiers for Improved Semantic Segmentation in Autonomous Driving

TL;DR

The paper tackles the problem of explainability for spectral-spatial deep learning models used in semantic segmentation for autonomous driving. It argues that conventional CAM-based saliency methods fail to reliably reflect the decision process in segmentation and extends the conservativeness concept to segmentation to ensure faithful attributions. To improve explainability, it investigates activations and weights from key layers in HSI-enabled U-Net models trained on the HSIDriveV20 dataset, comparing 1-, 3-, and 25-channel inputs with per-pixel normalization. The findings show that while richer spectral information (25-channel PN) can enhance segmentation robustness for heterogeneous classes, it may weaken edge delineation, underscoring the need for segmentation-specific explainability tools and ongoing work to balance spectral richness with reliable, interpretable outputs in safety-critical driving systems.

Abstract

Integrating hyperspectral imagery (HSI) with deep neural networks (DNNs) can strengthen the accuracy of intelligent vision systems by combining spectral and spatial information, which is useful for tasks like semantic segmentation in autonomous driving. To advance research in such safety-critical systems, determining the precise contribution of spectral information to complex DNNs' output is needed. To address this, several saliency methods, such as class activation maps (CAM), have been proposed primarily for image classification. However, recent studies have raised concerns regarding their reliability. In this paper, we address their limitations and propose an alternative approach by leveraging the data provided by activations and weights from relevant DNN layers to better capture the relationship between input features and predictions. The study aims to assess the superior performance of HSI compared to 3-channel and single-channel DNNs. We also address the influence of spectral signature normalization for enhancing DNN robustness in real-world driving conditions.

Paper Structure

This paper contains 12 sections, 12 equations, 6 figures, 1 table.

Figures (6)

  • Figure 1: Conservativeness property in semantic segmentation.
  • Figure 2: Effect of the SegGradCAM gradient weighting modification on $Conv2D\_1$ activation from HSI & PN model. Image 1111_576 from HSI-Drive v2.0 HSIDriveV20. a) Pseudocolor version. b) Spatial average. c) Non-spatial average.
  • Figure 3: Inference result on image 3112_104 from HSI-Drive HSIDriveV20.
  • Figure 4: Weight (dots) and bias (dashed line) values of the 1x1 $conv2D\_22$ convolution layer for each class in the three models. Lines below the graph indicate the most correlated weights from other models, while the line above indicates the most correlated weights from the same model: orange (1-channel), turquoise (3-channel), and brown (25-channel with PN).
  • Figure 5: Selected outputs/inputs from block $conv2D\_21/22$ of 1-channel (left), 3-channel (center) and 25-channel with PN (right) models. Numbers indicate the activation channel. Letters indicate highly correlated channels as shown in Fig. \ref{['fig:WeightsBiasLastConv']}. a) Common activation for Veg., Sky and Others. b) Strong activation for Road. c) Strong activation for Marks. d) Homogeneity of Veg. under strong light contrast. e) Strong activation for Others. f) Common activation for Road and Marks.
  • ...and 1 more figures