Table of Contents
Fetching ...

Attention-Enhanced Prototypical Learning for Few-Shot Infrastructure Defect Segmentation

Christina Thrainer, Md Meftahul Ferdaus, Mahdi Abdelguerfi, Christian Guetl, Steven Sloan, Kendall N. Niles, Ken Pathak

TL;DR

This work tackles the problem of few-shot semantic segmentation for infrastructure defects by integrating an Enhanced Feature Pyramid Network (E-FPN) with prototypical learning and attention mechanisms. The authors introduce InceptionSepConv-based multi-scale feature extraction, masked average pooling for robust prototypes, and three attention variants (Self, Local Self, Cross) to boost prototype quality. A two-stage training strategy—encoder pre-training followed by joint fine-tuning with bidirectional prototypical losses—yields strong performance across 9-way 5-shot and 2-way 5-shot configurations, with Self-Attention providing the largest gains. The framework demonstrates practical potential for rapid adaptation to new defect types with limited data, offering substantial benefits for efficient and economical infrastructure maintenance in real-world inspection systems.

Abstract

Few-shot semantic segmentation is vital for deep learning-based infrastructure inspection applications, where labeled training examples are scarce and expensive. Although existing deep learning frameworks perform well, the need for extensive labeled datasets and the inability to learn new defect categories with little data are problematic. We present our Enhanced Feature Pyramid Network (E-FPN) framework for few-shot semantic segmentation of culvert and sewer defect categories using a prototypical learning framework. Our approach has three main contributions: (1) adaptive E-FPN encoder using InceptionSepConv blocks and depth-wise separable convolutions for efficient multi-scale feature extraction; (2) prototypical learning with masked average pooling for powerful prototype generation from small support examples; and (3) attention-based feature representation through global self-attention, local self-attention and cross-attention. Comprehensive experimentation on challenging infrastructure inspection datasets illustrates that the method achieves excellent few-shot performance, with the best configuration being 8-way 5-shot training configuration at 82.55% F1-score and 72.26% mIoU in 2-way classification testing. The self-attention method had the most significant performance improvements, providing 2.57% F1-score and 2.9% mIoU gain over baselines. Our framework addresses the critical need to rapidly respond to new defect types in infrastructure inspection systems with limited new training data that lead to more efficient and economical maintenance plans for critical infrastructure systems.

Attention-Enhanced Prototypical Learning for Few-Shot Infrastructure Defect Segmentation

TL;DR

This work tackles the problem of few-shot semantic segmentation for infrastructure defects by integrating an Enhanced Feature Pyramid Network (E-FPN) with prototypical learning and attention mechanisms. The authors introduce InceptionSepConv-based multi-scale feature extraction, masked average pooling for robust prototypes, and three attention variants (Self, Local Self, Cross) to boost prototype quality. A two-stage training strategy—encoder pre-training followed by joint fine-tuning with bidirectional prototypical losses—yields strong performance across 9-way 5-shot and 2-way 5-shot configurations, with Self-Attention providing the largest gains. The framework demonstrates practical potential for rapid adaptation to new defect types with limited data, offering substantial benefits for efficient and economical infrastructure maintenance in real-world inspection systems.

Abstract

Few-shot semantic segmentation is vital for deep learning-based infrastructure inspection applications, where labeled training examples are scarce and expensive. Although existing deep learning frameworks perform well, the need for extensive labeled datasets and the inability to learn new defect categories with little data are problematic. We present our Enhanced Feature Pyramid Network (E-FPN) framework for few-shot semantic segmentation of culvert and sewer defect categories using a prototypical learning framework. Our approach has three main contributions: (1) adaptive E-FPN encoder using InceptionSepConv blocks and depth-wise separable convolutions for efficient multi-scale feature extraction; (2) prototypical learning with masked average pooling for powerful prototype generation from small support examples; and (3) attention-based feature representation through global self-attention, local self-attention and cross-attention. Comprehensive experimentation on challenging infrastructure inspection datasets illustrates that the method achieves excellent few-shot performance, with the best configuration being 8-way 5-shot training configuration at 82.55% F1-score and 72.26% mIoU in 2-way classification testing. The self-attention method had the most significant performance improvements, providing 2.57% F1-score and 2.9% mIoU gain over baselines. Our framework addresses the critical need to rapidly respond to new defect types in infrastructure inspection systems with limited new training data that lead to more efficient and economical maintenance plans for critical infrastructure systems.

Paper Structure

This paper contains 30 sections, 16 equations, 1 figure, 3 tables.

Figures (1)

  • Figure 1: Overall architecture of the proposed few-shot semantic segmentation framework. (a) Query-centric learning: prototypes generated from support features segment the query image, computing $\mathcal{L}_{query}$. (b) Support-centric learning: prototypes from query features segment support images, computing $\mathcal{L}_{support}$. The E-FPN encoder uses shared weights for both sets.