Table of Contents
Fetching ...

Local-Global Attention: An Adaptive Mechanism for Multi-Scale Feature Integration

Yifan Shao

TL;DR

This work proposes a novel attention mechanism, termed Local-Global Attention, which is designed to better integrate both local and global contextual features, and combines multi-scale convolutions with positional encoding, enabling the model to focus on local details while concurrently considering the broader global context.

Abstract

In recent years, attention mechanisms have significantly enhanced the performance of object detection by focusing on key feature information. However, prevalent methods still encounter difficulties in effectively balancing local and global features. This imbalance hampers their ability to capture both fine-grained details and broader contextual information-two critical elements for achieving accurate object detection.To address these challenges, we propose a novel attention mechanism, termed Local-Global Attention, which is designed to better integrate both local and global contextual features. Specifically, our approach combines multi-scale convolutions with positional encoding, enabling the model to focus on local details while concurrently considering the broader global context. Additionally, we introduce a learnable parameters, which allow the model to dynamically adjust the relative importance of local and global attention, depending on the specific requirements of the task, thereby optimizing feature representations across multiple scales.We have thoroughly evaluated the Local-Global Attention mechanism on several widely used object detection and classification datasets. Our experimental results demonstrate that this approach significantly enhances the detection of objects at various scales, with particularly strong performance on multi-class and small object detection tasks. In comparison to existing attention mechanisms, Local-Global Attention consistently outperforms them across several key metrics, all while maintaining computational efficiency.

Local-Global Attention: An Adaptive Mechanism for Multi-Scale Feature Integration

TL;DR

This work proposes a novel attention mechanism, termed Local-Global Attention, which is designed to better integrate both local and global contextual features, and combines multi-scale convolutions with positional encoding, enabling the model to focus on local details while concurrently considering the broader global context.

Abstract

In recent years, attention mechanisms have significantly enhanced the performance of object detection by focusing on key feature information. However, prevalent methods still encounter difficulties in effectively balancing local and global features. This imbalance hampers their ability to capture both fine-grained details and broader contextual information-two critical elements for achieving accurate object detection.To address these challenges, we propose a novel attention mechanism, termed Local-Global Attention, which is designed to better integrate both local and global contextual features. Specifically, our approach combines multi-scale convolutions with positional encoding, enabling the model to focus on local details while concurrently considering the broader global context. Additionally, we introduce a learnable parameters, which allow the model to dynamically adjust the relative importance of local and global attention, depending on the specific requirements of the task, thereby optimizing feature representations across multiple scales.We have thoroughly evaluated the Local-Global Attention mechanism on several widely used object detection and classification datasets. Our experimental results demonstrate that this approach significantly enhances the detection of objects at various scales, with particularly strong performance on multi-class and small object detection tasks. In comparison to existing attention mechanisms, Local-Global Attention consistently outperforms them across several key metrics, all while maintaining computational efficiency.

Paper Structure

This paper contains 25 sections, 15 equations, 2 figures, 7 tables.

Figures (2)

  • Figure 1: The mAP@50 and mAP@50:95 results on the COCO2017 lin2015microsoft and VOC2007 pascal-voc-2007 datasets compare the performance of MobileNetV3 howard2019searching with its enhanced version using the Local-Global Attention mechanism. On COCO2017 lin2015microsoft, all models were trained for 20 epochs using the Adam optimizer, while on VOC2007 pascal-voc-2007, training was conducted for 200 epochs with the AdamW optimizer. In both cases, other settings followed the YOLOv8 yolov8_ultralytics default configuration.
  • Figure 2: Local-global attention structure diagram