Table of Contents
Fetching ...

XAI-Enhanced Semantic Segmentation Models for Visual Quality Inspection

Tobias Clement, Truong Thanh Hung Nguyen, Mohamed Abdelaal, Hung Cao

TL;DR

The paper tackles the interpretability gap in visual quality inspection by introducing a CAM-based Explainable AI framework to refine semantic segmentation models, notably DeepLabv3-ResNet101, for more trustworthy defect detection. It details a four-stage workflow—training, CAM-based explanations, XAI evaluation, and annotation augmentation guided by explanations—applied to the TTPLA dataset in a cloud/mobile VQI setting. Through systematic evaluation, HiResCAM emerges as the most faithful and efficient XAI method, guiding an annotation-augmentation strategy that yields measurable IoU gains, particularly for challenging objects like cables. The results demonstrate that XAI-driven enhancements can improve segmentation quality while maintaining usability, paving the way for more transparent, robust VQI systems in manufacturing contexts.

Abstract

Visual quality inspection systems, crucial in sectors like manufacturing and logistics, employ computer vision and machine learning for precise, rapid defect detection. However, their unexplained nature can hinder trust, error identification, and system improvement. This paper presents a framework to bolster visual quality inspection by using CAM-based explanations to refine semantic segmentation models. Our approach consists of 1) Model Training, 2) XAI-based Model Explanation, 3) XAI Evaluation, and 4) Annotation Augmentation for Model Enhancement, informed by explanations and expert insights. Evaluations show XAI-enhanced models surpass original DeepLabv3-ResNet101 models, especially in intricate object segmentation.

XAI-Enhanced Semantic Segmentation Models for Visual Quality Inspection

TL;DR

The paper tackles the interpretability gap in visual quality inspection by introducing a CAM-based Explainable AI framework to refine semantic segmentation models, notably DeepLabv3-ResNet101, for more trustworthy defect detection. It details a four-stage workflow—training, CAM-based explanations, XAI evaluation, and annotation augmentation guided by explanations—applied to the TTPLA dataset in a cloud/mobile VQI setting. Through systematic evaluation, HiResCAM emerges as the most faithful and efficient XAI method, guiding an annotation-augmentation strategy that yields measurable IoU gains, particularly for challenging objects like cables. The results demonstrate that XAI-driven enhancements can improve segmentation quality while maintaining usability, paving the way for more transparent, robust VQI systems in manufacturing contexts.

Abstract

Visual quality inspection systems, crucial in sectors like manufacturing and logistics, employ computer vision and machine learning for precise, rapid defect detection. However, their unexplained nature can hinder trust, error identification, and system improvement. This paper presents a framework to bolster visual quality inspection by using CAM-based explanations to refine semantic segmentation models. Our approach consists of 1) Model Training, 2) XAI-based Model Explanation, 3) XAI Evaluation, and 4) Annotation Augmentation for Model Enhancement, informed by explanations and expert insights. Evaluations show XAI-enhanced models surpass original DeepLabv3-ResNet101 models, especially in intricate object segmentation.
Paper Structure (11 sections, 6 figures, 2 tables)

This paper contains 11 sections, 6 figures, 2 tables.

Figures (6)

  • Figure 1: Samples from the TTPLA dataset represent the main objects of categories (a) cable, (b) tower_wooden, (c) tower_lattice, (d) tower_tucohy.
  • Figure 2: The enhanced VQI framework integrated with XAI methods with 4 building blocks: (1) Training models, (2) Model Explanation with XAI, (3) XAI Evaluation, and (4) Model Improvement by XAI with Human-in-the-loop. The end-users interact with the framework via a web application.
  • Figure 3: The qualitative evaluation of implemented XAI methods on the segmentation result of the DeepLabv3-ResNet101 model on a sample from the test set. The category for the segmentation is the tower_wooden denoted under the yellow box shown in the ground truth. The IoU value between the segmentation and the ground truth is 0.9085.
  • Figure 4: List of input images, COCO annotations (ground truth), segmentation results of the DeepLabv3-ResNet101 model, and the HiResCAM explanations in increasing order of complexity.
  • Figure 5: Annotation augmentation methods include: (a) Increasing annotation size for slender objects such as cables, and (b) Adding annotations for easily-confused elements, like road markings, to help the model differentiate them from objects like white cables.
  • ...and 1 more figures