Table of Contents
Fetching ...

Multi-Modal Attention Networks for Enhanced Segmentation and Depth Estimation of Subsurface Defects in Pulse Thermography

Mohammed Salah, Naoufel Werghi, Davor Svetinovic, Yusra Abdulrahman

TL;DR

This work addresses the limitation of using single-modality representations (PCA or TSR) in pulse thermography by introducing PT-Fusion, a multi-modal attention-based fusion network. PT-Fusion employs two parallel CNN heads for PCA and TSR, and novel fusion blocks—Encoder Attention Fusion Gate ($\text{EAFG}$) and Attention Enhanced Decoding Block ($\text{AEDB}$)—to adaptively fuse features for defect segmentation and depth estimation. A spatiotemporal data augmentation strategy expands the scarce PT dataset, enabling robust training. On the IRT-PVC dataset, PT-Fusion outperforms state-of-the-art methods (U-Net, attention U-Net, 3D-CNN) with an IoU of $0.882$ for depth-aware segmentation and a depth MAE of $0.0082$ cm, validating the effectiveness of multi-modal fusion for subsurface defect analysis in PT. The approach offers practical improvements for automated NDT in industrial settings and points to future work on other PT modalities and materials.

Abstract

AI-driven pulse thermography (PT) has become a crucial tool in non-destructive testing (NDT), enabling automatic detection of hidden anomalies in various industrial components. Current state-of-the-art techniques feed segmentation and depth estimation networks compressed PT sequences using either Principal Component Analysis (PCA) or Thermographic Signal Reconstruction (TSR). However, treating these two modalities independently constrains the performance of PT inspection models as these representations possess complementary semantic features. To address this limitation, this work proposes PT-Fusion, a multi-modal attention-based fusion network that fuses both PCA and TSR modalities for defect segmentation and depth estimation of subsurface defects in PT setups. PT-Fusion introduces novel feature fusion modules, Encoder Attention Fusion Gate (EAFG) and Attention Enhanced Decoding Block (AEDB), to fuse PCA and TSR features for enhanced segmentation and depth estimation of subsurface defects. In addition, a novel data augmentation technique is proposed based on random data sampling from thermographic sequences to alleviate the scarcity of PT datasets. The proposed method is benchmarked against state-of-the-art PT inspection models, including U-Net, attention U-Net, and 3D-CNN on the Université Laval IRT-PVC dataset. The results demonstrate that PT-Fusion outperforms the aforementioned models in defect segmentation and depth estimation accuracies with a margin of 10%.

Multi-Modal Attention Networks for Enhanced Segmentation and Depth Estimation of Subsurface Defects in Pulse Thermography

TL;DR

This work addresses the limitation of using single-modality representations (PCA or TSR) in pulse thermography by introducing PT-Fusion, a multi-modal attention-based fusion network. PT-Fusion employs two parallel CNN heads for PCA and TSR, and novel fusion blocks—Encoder Attention Fusion Gate () and Attention Enhanced Decoding Block ()—to adaptively fuse features for defect segmentation and depth estimation. A spatiotemporal data augmentation strategy expands the scarce PT dataset, enabling robust training. On the IRT-PVC dataset, PT-Fusion outperforms state-of-the-art methods (U-Net, attention U-Net, 3D-CNN) with an IoU of for depth-aware segmentation and a depth MAE of cm, validating the effectiveness of multi-modal fusion for subsurface defect analysis in PT. The approach offers practical improvements for automated NDT in industrial settings and points to future work on other PT modalities and materials.

Abstract

AI-driven pulse thermography (PT) has become a crucial tool in non-destructive testing (NDT), enabling automatic detection of hidden anomalies in various industrial components. Current state-of-the-art techniques feed segmentation and depth estimation networks compressed PT sequences using either Principal Component Analysis (PCA) or Thermographic Signal Reconstruction (TSR). However, treating these two modalities independently constrains the performance of PT inspection models as these representations possess complementary semantic features. To address this limitation, this work proposes PT-Fusion, a multi-modal attention-based fusion network that fuses both PCA and TSR modalities for defect segmentation and depth estimation of subsurface defects in PT setups. PT-Fusion introduces novel feature fusion modules, Encoder Attention Fusion Gate (EAFG) and Attention Enhanced Decoding Block (AEDB), to fuse PCA and TSR features for enhanced segmentation and depth estimation of subsurface defects. In addition, a novel data augmentation technique is proposed based on random data sampling from thermographic sequences to alleviate the scarcity of PT datasets. The proposed method is benchmarked against state-of-the-art PT inspection models, including U-Net, attention U-Net, and 3D-CNN on the Université Laval IRT-PVC dataset. The results demonstrate that PT-Fusion outperforms the aforementioned models in defect segmentation and depth estimation accuracies with a margin of 10%.
Paper Structure (16 sections, 24 equations, 7 figures, 5 tables, 1 algorithm)

This paper contains 16 sections, 24 equations, 7 figures, 5 tables, 1 algorithm.

Figures (7)

  • Figure 1: PT setup involving flash lamps providing heat pulse to a specimen. Defective areas trap heat flow generating abnormal temperature distributions.
  • Figure 2: Generated thermographic representations, a) PCA and b) TSR, for sequence Z_013 in IRT-PVC dataset.
  • Figure 3: a) PT-Fusion network architecture. PCA and TSR images are fed to a shallow CNN head. b) The Feature fusion module, Encoder Attention Fusion Gate (EAFG), is placed at the output of the encoders, and c) the Attention Enhanced Decoding Block (EADB) is added to adaptively fuse encoded features with the decoders.
  • Figure 4: Encoder layer $m$ of the PT-Fusion CNN heads involving a residual convolution block and downsampling max pooling for feature extraction.
  • Figure 5: Simultaneous segmentation and depth estimation predictions by PT-Fusion for 5 samples a)-e). The segmentation and depth predictions, illustrated in the first and second rows, respectively, are compared against the ground truth in the third row.
  • ...and 2 more figures