Multi-Modal Attention Networks for Enhanced Segmentation and Depth Estimation of Subsurface Defects in Pulse Thermography
Mohammed Salah, Naoufel Werghi, Davor Svetinovic, Yusra Abdulrahman
TL;DR
This work addresses the limitation of using single-modality representations (PCA or TSR) in pulse thermography by introducing PT-Fusion, a multi-modal attention-based fusion network. PT-Fusion employs two parallel CNN heads for PCA and TSR, and novel fusion blocks—Encoder Attention Fusion Gate ($\text{EAFG}$) and Attention Enhanced Decoding Block ($\text{AEDB}$)—to adaptively fuse features for defect segmentation and depth estimation. A spatiotemporal data augmentation strategy expands the scarce PT dataset, enabling robust training. On the IRT-PVC dataset, PT-Fusion outperforms state-of-the-art methods (U-Net, attention U-Net, 3D-CNN) with an IoU of $0.882$ for depth-aware segmentation and a depth MAE of $0.0082$ cm, validating the effectiveness of multi-modal fusion for subsurface defect analysis in PT. The approach offers practical improvements for automated NDT in industrial settings and points to future work on other PT modalities and materials.
Abstract
AI-driven pulse thermography (PT) has become a crucial tool in non-destructive testing (NDT), enabling automatic detection of hidden anomalies in various industrial components. Current state-of-the-art techniques feed segmentation and depth estimation networks compressed PT sequences using either Principal Component Analysis (PCA) or Thermographic Signal Reconstruction (TSR). However, treating these two modalities independently constrains the performance of PT inspection models as these representations possess complementary semantic features. To address this limitation, this work proposes PT-Fusion, a multi-modal attention-based fusion network that fuses both PCA and TSR modalities for defect segmentation and depth estimation of subsurface defects in PT setups. PT-Fusion introduces novel feature fusion modules, Encoder Attention Fusion Gate (EAFG) and Attention Enhanced Decoding Block (AEDB), to fuse PCA and TSR features for enhanced segmentation and depth estimation of subsurface defects. In addition, a novel data augmentation technique is proposed based on random data sampling from thermographic sequences to alleviate the scarcity of PT datasets. The proposed method is benchmarked against state-of-the-art PT inspection models, including U-Net, attention U-Net, and 3D-CNN on the Université Laval IRT-PVC dataset. The results demonstrate that PT-Fusion outperforms the aforementioned models in defect segmentation and depth estimation accuracies with a margin of 10%.
