Table of Contents
Fetching ...

EPBC-YOLOv8: An efficient and accurate improved YOLOv8 underwater detector based on an attention mechanism

Xing Jiang, Xiting Zhuang, Jisheng Chen, Jian Zhang

Abstract

In this study, we enhance underwater target detection by integrating channel and spatial attention into YOLOv8's backbone, applying Pointwise Convolution in FasterNeXt for the FasterPW model, and leveraging Weighted Concat in a BiFPN-inspired WFPN structure for improved cross-scale connections and robustness. Utilizing CARAFE for refined feature reassembly, our framework addresses underwater image degradation, achieving mAP at 0.5 scores of 76.7 percent and 79.0 percent on URPC2019 and URPC2020 datasets, respectively. These scores are 2.3 percent and 0.7 percent higher than the original YOLOv8, showcasing enhanced precision in detecting marine organisms.

EPBC-YOLOv8: An efficient and accurate improved YOLOv8 underwater detector based on an attention mechanism

Abstract

In this study, we enhance underwater target detection by integrating channel and spatial attention into YOLOv8's backbone, applying Pointwise Convolution in FasterNeXt for the FasterPW model, and leveraging Weighted Concat in a BiFPN-inspired WFPN structure for improved cross-scale connections and robustness. Utilizing CARAFE for refined feature reassembly, our framework addresses underwater image degradation, achieving mAP at 0.5 scores of 76.7 percent and 79.0 percent on URPC2019 and URPC2020 datasets, respectively. These scores are 2.3 percent and 0.7 percent higher than the original YOLOv8, showcasing enhanced precision in detecting marine organisms.

Paper Structure

This paper contains 23 sections, 6 equations, 11 figures, 6 tables.

Figures (11)

  • Figure 1: The structure of EPBC-YOLOv8.
  • Figure 2: Schematic diagram of EMA. Here, 'g' represents grouping, 'X Avg Pool' represents 1D horizontal global pooling, and 'Y Avg Pool' represents 1D vertical global pooling.
  • Figure 3: The structure of the C2f_EMA module. In the Bottleneck module, T/F indicates whether a shortcut is used. T stands for true, and F stands for false.
  • Figure 4: The construction process of the FasterPW structure. This structure adopts a lightweight design concept, initially applying PWConv to the FasterPWBlock network, then combining multiple FasterPWBlock and multiple ConvModules to construct an efficient and lightweight feature extraction network.
  • Figure 5: Comparison of different FPN structures, including the structures of (a) FPN, (b) PANet, (c) NASFPN, (d) SimplifiedPANet, (e) BiFPN and (f)WFPN networks.
  • ...and 6 more figures