Attention-Enhanced Co-Interactive Fusion Network (AECIF-Net) for Automated Structural Condition Assessment in Visual Inspection

Chenyu Zhang; Zhaozheng Yin; Ruwen Qin

Attention-Enhanced Co-Interactive Fusion Network (AECIF-Net) for Automated Structural Condition Assessment in Visual Inspection

Chenyu Zhang, Zhaozheng Yin, Ruwen Qin

TL;DR

This work tackles automated structural condition assessment from visual bridge inspections by proposing AECIF-Net, a multi-task network that simultaneously segments structural elements and surface corrosion. The architecture combines a shared high-resolution encoder with task-specific relearning subnets and a co-interactive spatial-attention fusion module, optimized by Dynamic Weight Average losses. A new SBCIV dataset is introduced to benchmark multi-task SHM vision tasks, and Ablation studies, quantitative comparisons, and a complexity analysis demonstrate state-of-the-art performance (element mIoU ≈ 92.11% and corrosion mIoU ≈ 87.16%) with a compact parameter footprint (~77.22M vs ~131.70M for dual single-task models). The approach offers robust, data-efficient multi-task visual inspection capabilities, advancing automated bridge health monitoring and enabling more reliable, scalable field assessments.

Abstract

Efficiently monitoring the condition of civil infrastructure requires automating the structural condition assessment in visual inspection. This paper proposes an Attention-Enhanced Co-Interactive Fusion Network (AECIF-Net) for automatic structural condition assessment in visual bridge inspection. AECIF-Net can simultaneously parse structural elements and segment surface defects on the elements in inspection images. It integrates two task-specific relearning subnets to extract task-specific features from an overall feature embedding. A co-interactive feature fusion module further captures the spatial correlation and facilitates information sharing between tasks. Experimental results demonstrate that the proposed AECIF-Net outperforms the current state-of-the-art approaches, achieving promising performance with 92.11% mIoU for element segmentation and 87.16% mIoU for corrosion segmentation on the test set of the new benchmark dataset Steel Bridge Condition Inspection Visual (SBCIV). An ablation study verifies the merits of the designs for AECIF-Net, and a case study demonstrates its capability to automate structural condition assessment.

Attention-Enhanced Co-Interactive Fusion Network (AECIF-Net) for Automated Structural Condition Assessment in Visual Inspection

TL;DR

Abstract

Paper Structure (28 sections, 15 equations, 10 figures, 3 tables)

This paper contains 28 sections, 15 equations, 10 figures, 3 tables.

Introduction
Related work
DCNN-based defect segmentation
DCNN-based structural element segmentation
MTL in visual structural assessment
AECIF-Net
Shared encoder
Task-specific feature relearning subnets
Co-interactive fusion module
Reconstruction decoder
Loss function
Experimental setup
Dataset and data augmentation
Data collection
Data annotation
...and 13 more sections

Figures (10)

Figure 1: Challenges in deep learning-based visual inspection: (a) potential confusion due to surface defects, where a rusted girder might be mistakenly identified as a bearing; (b) issues from surface inhomogeneity, shadows, and poor lighting affecting defect assessment; (c) overlooked spatial correlation between element segmentation and defect segmentation tasks. (Origin images courtesy of bianchi2021coco).
Figure 2: Framework of the automated visual bridge inspection using UAVs, where AECIF-Net analyzes collected images to segment structural elements and defects, leads to a comprehensive structural condition assessment. (Example images courtesy of Shengqian Zheng and bianchi2021coco.)
Figure 3: Architecture of the AECIF-Net, which features a share-split-interaction pipeline composed of a shared high-resolution deep encoder, two task-specific relearning subnets, and a co-interactive feature fusion module.
Figure 4: Illustration of bridge element classes that collectively define the structural area within inspection images. (Base images courtesy of bianchi2021coco and Bianchi2021.)
Figure 5: Data distributions of the SBCIV dataset, split by training, validation, and test: (a) by the number of element classes in an image, (b) by the number of elements in an image. The dataset contains a wide range of different examples that are important for teaching models to handle various situations.
...and 5 more figures

Attention-Enhanced Co-Interactive Fusion Network (AECIF-Net) for Automated Structural Condition Assessment in Visual Inspection

TL;DR

Abstract

Attention-Enhanced Co-Interactive Fusion Network (AECIF-Net) for Automated Structural Condition Assessment in Visual Inspection

Authors

TL;DR

Abstract

Table of Contents

Figures (10)