Table of Contents
Fetching ...

Multi-label Sewer Pipe Defect Recognition with Mask Attention Feature Enhancement and Label Correlation Learning

Xin Zuo, Yu Sheng, Jifeng Shen, Yongwei Shan

TL;DR

MA-Q2L tackles multi-label sewer pipe defect recognition under severe data imbalance by integrating a mask-attention based feature enhancer with label-correlation learning in a Transformer-based encoder-decoder. A CAM-derived attention mask focuses the model on local defect regions, while a label-embedding self-attention captures inter-label relationships, aided by an asymmetric loss with targeted weights to address bottleneck categories. On a 1/16 subset of Sewer-ML, the method achieves near-state-of-the-art results, and on the full dataset it surpasses the previous best in F2 by a substantial margin, with added capability to provide rough defect localization heatmaps for practitioners. The approach delivers improved accuracy, efficiency, and interpretability for sewer condition assessment, enabling effective detection with less labeled data and offering actionable visualization for field engineers.

Abstract

The coexistence of multiple defect categories as well as the substantial class imbalance problem significantly impair the detection of sewer pipeline defects. To solve this problem, a multi-label pipe defect recognition method is proposed based on mask attention guided feature enhancement and label correlation learning. The proposed method can achieve current approximate state-of-the-art classification performance using just 1/16 of the Sewer-ML training dataset and exceeds the current best method by 11.87\% in terms of F2 metric on the full dataset, while also proving the superiority of the model. The major contribution of this study is the development of a more efficient model for identifying and locating multiple defects in sewer pipe images for a more accurate sewer pipeline condition assessment. Moreover, by employing class activation maps, our method can accurately pinpoint multiple defect categories in the image which demonstrates a strong model interpretability. Our code is available at \href{https://github.com/shengyu27/MA-Q2L}{\textcolor{black}{https://github.com/shengyu27/MA-Q2L.}

Multi-label Sewer Pipe Defect Recognition with Mask Attention Feature Enhancement and Label Correlation Learning

TL;DR

MA-Q2L tackles multi-label sewer pipe defect recognition under severe data imbalance by integrating a mask-attention based feature enhancer with label-correlation learning in a Transformer-based encoder-decoder. A CAM-derived attention mask focuses the model on local defect regions, while a label-embedding self-attention captures inter-label relationships, aided by an asymmetric loss with targeted weights to address bottleneck categories. On a 1/16 subset of Sewer-ML, the method achieves near-state-of-the-art results, and on the full dataset it surpasses the previous best in F2 by a substantial margin, with added capability to provide rough defect localization heatmaps for practitioners. The approach delivers improved accuracy, efficiency, and interpretability for sewer condition assessment, enabling effective detection with less labeled data and offering actionable visualization for field engineers.

Abstract

The coexistence of multiple defect categories as well as the substantial class imbalance problem significantly impair the detection of sewer pipeline defects. To solve this problem, a multi-label pipe defect recognition method is proposed based on mask attention guided feature enhancement and label correlation learning. The proposed method can achieve current approximate state-of-the-art classification performance using just 1/16 of the Sewer-ML training dataset and exceeds the current best method by 11.87\% in terms of F2 metric on the full dataset, while also proving the superiority of the model. The major contribution of this study is the development of a more efficient model for identifying and locating multiple defects in sewer pipe images for a more accurate sewer pipeline condition assessment. Moreover, by employing class activation maps, our method can accurately pinpoint multiple defect categories in the image which demonstrates a strong model interpretability. Our code is available at \href{https://github.com/shengyu27/MA-Q2L}{\textcolor{black}{https://github.com/shengyu27/MA-Q2L.}
Paper Structure (31 sections, 14 equations, 14 figures, 7 tables)

This paper contains 31 sections, 14 equations, 14 figures, 7 tables.

Figures (14)

  • Figure 1: The method in this paper can roughly localize the defect information in the image. The given example contains both types of defects: FS: displaced joint; RO: roots;
  • Figure 2: (a) Coexistence of multiple classes of defects obtained from the Sewer-ML dataset. (b) Some examples.
  • Figure 3: The overall structure of the proposed method. Where H, W are the height and width of the image after feature extraction, C is the number of channels and N is the number of dataset categories. In our model H=14, W=14, C=2048, N=17.
  • Figure 4: The architecture of the decoder.
  • Figure 5: Distribution of the number of defect categories for the different dataset sizes, normalized using max-min.
  • ...and 9 more figures