Table of Contents
Fetching ...

SegXAL: Explainable Active Learning for Semantic Segmentation in Driving Scene Scenarios

Sriram Mandalika, Athira Nambiar

TL;DR

SegXAL addresses data efficiency and interpretability in driving-scene semantic segmentation by introducing an Explainable Active Learning framework that combines an Explainable Error Mask (EEM) with uncertainty and proximity-based explainability. It integrates Entropy-based Uncertainty (EBU) and Proximity-aware Explainable-AI (PAE) to prioritize informative, nearby regions for annotation, offering machine pseudolabel and human annotation modes guided by GradCAM-based explanations. Using a Dice-based sample selection and iterative retraining of a segmentation backbone (U-Net) on Cityscapes under a fixed labeling budget, SegXAL achieves state-of-the-art mIoU (up to 65.11) over multiple AL cycles. The approach enhances interpretability and data efficiency, enabling more reliable deployment for autonomous driving systems by providing human-friendly justifications for annotated regions and decisions.

Abstract

Most of the sophisticated AI models utilize huge amounts of annotated data and heavy training to achieve high-end performance. However, there are certain challenges that hinder the deployment of AI models "in-the-wild" scenarios, i.e., inefficient use of unlabeled data, lack of incorporation of human expertise, and lack of interpretation of the results. To mitigate these challenges, we propose a novel Explainable Active Learning (XAL) model, XAL-based semantic segmentation model "SegXAL", that can (i) effectively utilize the unlabeled data, (ii) facilitate the "Human-in-the-loop" paradigm, and (iii) augment the model decisions in an interpretable way. In particular, we investigate the application of the SegXAL model for semantic segmentation in driving scene scenarios. The SegXAL model proposes the image regions that require labeling assistance from Oracle by dint of explainable AI (XAI) and uncertainty measures in a weakly-supervised manner. Specifically, we propose a novel Proximity-aware Explainable-AI (PAE) module and Entropy-based Uncertainty (EBU) module to get an Explainable Error Mask, which enables the machine teachers/human experts to provide intuitive reasoning behind the results and to solicit feedback to the AI system via an active learning strategy. Such a mechanism bridges the semantic gap between man and machine through collaborative intelligence, where humans and AI actively enhance each other's complementary strengths. A novel high-confidence sample selection technique based on the DICE similarity coefficient is also presented within the SegXAL framework. Extensive quantitative and qualitative analyses are carried out in the benchmarking Cityscape dataset. Results show the outperformance of our proposed SegXAL against other state-of-the-art models.

SegXAL: Explainable Active Learning for Semantic Segmentation in Driving Scene Scenarios

TL;DR

SegXAL addresses data efficiency and interpretability in driving-scene semantic segmentation by introducing an Explainable Active Learning framework that combines an Explainable Error Mask (EEM) with uncertainty and proximity-based explainability. It integrates Entropy-based Uncertainty (EBU) and Proximity-aware Explainable-AI (PAE) to prioritize informative, nearby regions for annotation, offering machine pseudolabel and human annotation modes guided by GradCAM-based explanations. Using a Dice-based sample selection and iterative retraining of a segmentation backbone (U-Net) on Cityscapes under a fixed labeling budget, SegXAL achieves state-of-the-art mIoU (up to 65.11) over multiple AL cycles. The approach enhances interpretability and data efficiency, enabling more reliable deployment for autonomous driving systems by providing human-friendly justifications for annotated regions and decisions.

Abstract

Most of the sophisticated AI models utilize huge amounts of annotated data and heavy training to achieve high-end performance. However, there are certain challenges that hinder the deployment of AI models "in-the-wild" scenarios, i.e., inefficient use of unlabeled data, lack of incorporation of human expertise, and lack of interpretation of the results. To mitigate these challenges, we propose a novel Explainable Active Learning (XAL) model, XAL-based semantic segmentation model "SegXAL", that can (i) effectively utilize the unlabeled data, (ii) facilitate the "Human-in-the-loop" paradigm, and (iii) augment the model decisions in an interpretable way. In particular, we investigate the application of the SegXAL model for semantic segmentation in driving scene scenarios. The SegXAL model proposes the image regions that require labeling assistance from Oracle by dint of explainable AI (XAI) and uncertainty measures in a weakly-supervised manner. Specifically, we propose a novel Proximity-aware Explainable-AI (PAE) module and Entropy-based Uncertainty (EBU) module to get an Explainable Error Mask, which enables the machine teachers/human experts to provide intuitive reasoning behind the results and to solicit feedback to the AI system via an active learning strategy. Such a mechanism bridges the semantic gap between man and machine through collaborative intelligence, where humans and AI actively enhance each other's complementary strengths. A novel high-confidence sample selection technique based on the DICE similarity coefficient is also presented within the SegXAL framework. Extensive quantitative and qualitative analyses are carried out in the benchmarking Cityscape dataset. Results show the outperformance of our proposed SegXAL against other state-of-the-art models.
Paper Structure (21 sections, 4 equations, 7 figures, 3 tables)

This paper contains 21 sections, 4 equations, 7 figures, 3 tables.

Figures (7)

  • Figure 1: Visual representation of Explainable Active Learning for semantic segmentation (SegXAL) framework. The framework starts with an initial segmentation of unlabeled data, leveraging pre-trained semantic segmentation deep neural network (e.g. U-net). Further, the Explainable Error Mask (EEM) module computes the uncertainty measure and proximity-aware XAI mask. Based on this EEM output, machine/human expert (oracle) makes intuitive labelling feedback to the system. Further, based on the Dice predictor-based query ranking mechanism, reannotated data are used for labeled pool update and model retraining.
  • Figure 2: Proximity-aware Explainable-AI (PAE) Module using MiDaS depth estimation technique. Analogous to MiDaS, DINOv2 depth map is also investigated in this paper.
  • Figure 3: Oracle's Reannotation workflow. The magenta point shown in \ref{['fig:revised-visuals']}(d) is the EEM output prompt corresponding to the relevant object candidate to be annotated.
  • Figure 4: Visualization of model performance over 5 Active Learning cycles.
  • Figure 5: Visualization of Machine-based Pseudolabel vs. Manual annotation outputs
  • ...and 2 more figures