Table of Contents
Fetching ...

BA-Net: Bridge Attention in Deep Neural Networks

Ronghui Zhang, Runzong Zou, Yue Zhao, Zirui Zhang, Junzhou Chen, Yue Cao, Chuan Hu, Houbing Song

TL;DR

This work extends the original bridge attention model (BAv1) by introducing an adaptive selection operator, which reduces information redundancy and optimizes the overall information exchange, which results in the development of BAv2, which achieves substantial performance improvements in the ImageNet classification task.

Abstract

Attention mechanisms, particularly channel attention, have become highly influential in numerous computer vision tasks. Despite their effectiveness, many existing methods primarily focus on optimizing performance through complex attention modules applied at individual convolutional layers, often overlooking the synergistic interactions that can occur across multiple layers. In response to this gap, we introduce bridge attention, a novel approach designed to facilitate more effective integration and information flow between different convolutional layers. Our work extends the original bridge attention model (BAv1) by introducing an adaptive selection operator, which reduces information redundancy and optimizes the overall information exchange. This enhancement results in the development of BAv2, which achieves substantial performance improvements in the ImageNet classification task, obtaining Top-1 accuracies of 80.49% and 81.75% when using ResNet50 and ResNet101 as backbone networks, respectively. These results surpass the retrained baselines by 1.61% and 0.77%, respectively. Furthermore, BAv2 outperforms other existing channel attention techniques, such as the classical SENet101, exceeding its retrained performance by 0.52% Additionally, integrating BAv2 into advanced convolutional networks and vision transformers has led to significant gains in performance across a wide range of computer vision tasks, underscoring its broad applicability.

BA-Net: Bridge Attention in Deep Neural Networks

TL;DR

This work extends the original bridge attention model (BAv1) by introducing an adaptive selection operator, which reduces information redundancy and optimizes the overall information exchange, which results in the development of BAv2, which achieves substantial performance improvements in the ImageNet classification task.

Abstract

Attention mechanisms, particularly channel attention, have become highly influential in numerous computer vision tasks. Despite their effectiveness, many existing methods primarily focus on optimizing performance through complex attention modules applied at individual convolutional layers, often overlooking the synergistic interactions that can occur across multiple layers. In response to this gap, we introduce bridge attention, a novel approach designed to facilitate more effective integration and information flow between different convolutional layers. Our work extends the original bridge attention model (BAv1) by introducing an adaptive selection operator, which reduces information redundancy and optimizes the overall information exchange. This enhancement results in the development of BAv2, which achieves substantial performance improvements in the ImageNet classification task, obtaining Top-1 accuracies of 80.49% and 81.75% when using ResNet50 and ResNet101 as backbone networks, respectively. These results surpass the retrained baselines by 1.61% and 0.77%, respectively. Furthermore, BAv2 outperforms other existing channel attention techniques, such as the classical SENet101, exceeding its retrained performance by 0.52% Additionally, integrating BAv2 into advanced convolutional networks and vision transformers has led to significant gains in performance across a wide range of computer vision tasks, underscoring its broad applicability.

Paper Structure

This paper contains 20 sections, 11 equations, 10 figures, 10 tables.

Figures (10)

  • Figure 1: Comparison of state-of-the-art attention modules (SENet hu2018squeeze, FcaNet qin2021fcanet, BAv1 zhao2022ba and BAv2) applied on ImageNet russakovsky2015imagenet dataset with ResNets he2016deep as backbones. The evaluation criteria include accuracy, network parameters and FLOPs. The size of circle represnets the FLOPs.
  • Figure 2: Attention maps visualization for different convolutional layers in a Bottleneck block of ResNet50 he2016deep. (a) Images from the validation set of ImageNet. (b) The grad-CAM selvaraju2017grad visualization of the Bottleneck block input. (c)-(e) The grad-CAM visualization of the three convolutional layers output in the Bottleneck block. (f) The grad-CAM visualization of the Bottleneck block output.
  • Figure 3: The comparison between the prevalent attention module and our bridge attention module is presented in terms of their block structures. It can be observed that the BA module bridges the features from preceding convolutional layers, as indicated by teh red arrows.
  • Figure 4: The overview of Bridge Attention-v2 module is provided. BAv2 dynamically combines squeezed features obtained from various convolutional layers using the imporved bridge operation to generates channel weights.
  • Figure 5: The improved bridge operation in BAv2.
  • ...and 5 more figures