Table of Contents
Fetching ...

Enhanced Small Target Detection via Multi-Modal Fusion and Attention Mechanisms: A YOLOv5 Approach

Xiaoxiao Ma, Junxiong Tong

TL;DR

A small target detection method based on multi-modal image fusion and attention mechanisms, which leverages YOLOv5, integrating infrared and visible light data along with a convolutional attention module to enhance detection performance.

Abstract

With the rapid development of information technology, modern warfare increasingly relies on intelligence, making small target detection critical in military applications. The growing demand for efficient, real-time detection has created challenges in identifying small targets in complex environments due to interference. To address this, we propose a small target detection method based on multi-modal image fusion and attention mechanisms. This method leverages YOLOv5, integrating infrared and visible light data along with a convolutional attention module to enhance detection performance. The process begins with multi-modal dataset registration using feature point matching, ensuring accurate network training. By combining infrared and visible light features with attention mechanisms, the model improves detection accuracy and robustness. Experimental results on anti-UAV and Visdrone datasets demonstrate the effectiveness and practicality of our approach, achieving superior detection results for small and dim targets.

Enhanced Small Target Detection via Multi-Modal Fusion and Attention Mechanisms: A YOLOv5 Approach

TL;DR

A small target detection method based on multi-modal image fusion and attention mechanisms, which leverages YOLOv5, integrating infrared and visible light data along with a convolutional attention module to enhance detection performance.

Abstract

With the rapid development of information technology, modern warfare increasingly relies on intelligence, making small target detection critical in military applications. The growing demand for efficient, real-time detection has created challenges in identifying small targets in complex environments due to interference. To address this, we propose a small target detection method based on multi-modal image fusion and attention mechanisms. This method leverages YOLOv5, integrating infrared and visible light data along with a convolutional attention module to enhance detection performance. The process begins with multi-modal dataset registration using feature point matching, ensuring accurate network training. By combining infrared and visible light features with attention mechanisms, the model improves detection accuracy and robustness. Experimental results on anti-UAV and Visdrone datasets demonstrate the effectiveness and practicality of our approach, achieving superior detection results for small and dim targets.

Paper Structure

This paper contains 18 sections, 5 equations, 2 figures, 1 table.

Figures (2)

  • Figure 1: The top/bottom figures show the $F_1$ curves of Visdrone dataset trained on the original model/the model with attention module, respectively.
  • Figure 2: The top/bottom figures show the $PR$ curves of Visdrone dataset trained on the original model/the model with attention module, respectively.