Table of Contents
Fetching ...

EMBANet: A Flexible Efffcient Multi-branch Attention Network

Keke Zu, Hu Zhang, Jian Lu, Lei Zhang, Chen Xu

TL;DR

This paper addresses the need for flexible, scalable multi-scale feature representation in CNNs by introducing the Multi-branch and Concat (MBC) module, which provides new degrees of freedom through adjustable transformation operators and branching. By coupling MBC with an attention mechanism, it proposes the Multi-branch Attention (MBA) module and the Efficient Multi-branch Attention (EMBA) block, culminating in the EMBANet backbone that can be deployed across classification, detection, and segmentation tasks. The core contributions include two MBC variants (Multiplex and Concat, MUC; Split and Concat, SPC), a unified MBA framework, and a family of EMBANet models (Small/Large; MUC/SPC variants) with substantial performance gains over strong baselines while maintaining efficiency. The work demonstrates that the DoF perspective enables superior multi-scale feature extraction and cross-channel recalibration, with broad applicability to mainstream CNN backbones and downstream tasks, and it suggests future NAS-based strategies to optimize DoF automatically for different tasks.

Abstract

This work presents a novel module, namely multi-branch concat (MBC), to process the input tensor and obtain the multi-scale feature map. The proposed MBC module brings new degrees of freedom (DoF) for the design of attention networks by allowing the type of transformation operators and the number of branches to be flexibly adjusted. Two important transformation operators, multiplex and split, are considered in this work, both of which can represent multi-scale features at a more granular level and increase the range of receptive fields. By integrating the MBC and attention module, a multi-branch attention (MBA) module is consequently developed to capture the channel-wise interaction of feature maps for establishing the long-range channel dependency. By substituting the 3x3 convolutions in the bottleneck blocks of the ResNet with the proposed MBA, a novel block namely efficient multi-branch attention (EMBA) is obtained, which can be easily plugged into the state-of-the-art backbone CNN models. Furthermore, a new backbone network called EMBANet is established by stacking the EMBA blocks. The proposed EMBANet is extensively evaluated on representative computer vision tasks including: classification, detection, and segmentation. And it demonstrates consistently superior performance over the popular backbones.

EMBANet: A Flexible Efffcient Multi-branch Attention Network

TL;DR

This paper addresses the need for flexible, scalable multi-scale feature representation in CNNs by introducing the Multi-branch and Concat (MBC) module, which provides new degrees of freedom through adjustable transformation operators and branching. By coupling MBC with an attention mechanism, it proposes the Multi-branch Attention (MBA) module and the Efficient Multi-branch Attention (EMBA) block, culminating in the EMBANet backbone that can be deployed across classification, detection, and segmentation tasks. The core contributions include two MBC variants (Multiplex and Concat, MUC; Split and Concat, SPC), a unified MBA framework, and a family of EMBANet models (Small/Large; MUC/SPC variants) with substantial performance gains over strong baselines while maintaining efficiency. The work demonstrates that the DoF perspective enables superior multi-scale feature extraction and cross-channel recalibration, with broad applicability to mainstream CNN backbones and downstream tasks, and it suggests future NAS-based strategies to optimize DoF automatically for different tasks.

Abstract

This work presents a novel module, namely multi-branch concat (MBC), to process the input tensor and obtain the multi-scale feature map. The proposed MBC module brings new degrees of freedom (DoF) for the design of attention networks by allowing the type of transformation operators and the number of branches to be flexibly adjusted. Two important transformation operators, multiplex and split, are considered in this work, both of which can represent multi-scale features at a more granular level and increase the range of receptive fields. By integrating the MBC and attention module, a multi-branch attention (MBA) module is consequently developed to capture the channel-wise interaction of feature maps for establishing the long-range channel dependency. By substituting the 3x3 convolutions in the bottleneck blocks of the ResNet with the proposed MBA, a novel block namely efficient multi-branch attention (EMBA) is obtained, which can be easily plugged into the state-of-the-art backbone CNN models. Furthermore, a new backbone network called EMBANet is established by stacking the EMBA blocks. The proposed EMBANet is extensively evaluated on representative computer vision tasks including: classification, detection, and segmentation. And it demonstrates consistently superior performance over the popular backbones.
Paper Structure (29 sections, 13 equations, 11 figures, 10 tables)

This paper contains 29 sections, 13 equations, 11 figures, 10 tables.

Figures (11)

  • Figure 1: The EMBANet block is presented as a unified framework for attention networks. By allowing the types of feature extraction operators and attention modules to be freely adjusted, the proposed EMBANet supports flexible network architectures. And thus, extra DoF is achieved by the proposed EMBANet.
  • Figure 2: The architecture of the proposed EMBANet, which has four hierarchical stages, each with a stack of EMBANet blocks and preceded by a downsampling operator. In particular, the stem unit consists of a 7$\times$7 convolution with stride 2 and a maxpooling operator. The quantity $L_{i}$ represents the number of EMBANet blocks at stage $i$, with the default set as (3, 4, 6, 3), and the classifier consists of an average pooling operator followed by a fully connected layer.
  • Figure 3: An illustration of the SE module.
  • Figure 4: An illustration of the proposed MBC with $S$ branches, where the concat function is used to concatenate features in the channel dimension.
  • Figure 5: An illustration of the proposed MUC with the multiplexing rate S sets to 4, where the symbol K is the kernel size and G is the group size.
  • ...and 6 more figures