Table of Contents
Fetching ...

CBAM: Convolutional Block Attention Module

Sanghyun Woo, Jongchan Park, Joon-Young Lee, In So Kweon

TL;DR

This paper introduces CBAM, a lightweight plug-in attention module for CNNs that refines intermediate feature maps via sequential channel and spatial attention. Channel attention uses both average- and max-pooled descriptors with a shared MLP to produce a channel map, while spatial attention uses channel-pooled descriptors and a 7x7 convolution to produce a spatial map. The authors demonstrate strong, consistent improvements across ImageNet-1K, MS COCO, and VOC 2007 over many backbones with negligible overhead, and provide Grad-CAM visualizations showing better localization. Overall, CBAM offers a general, efficient mechanism to boost representation power in CNNs, supporting broader adoption as a modular component.

Abstract

We propose Convolutional Block Attention Module (CBAM), a simple yet effective attention module for feed-forward convolutional neural networks. Given an intermediate feature map, our module sequentially infers attention maps along two separate dimensions, channel and spatial, then the attention maps are multiplied to the input feature map for adaptive feature refinement. Because CBAM is a lightweight and general module, it can be integrated into any CNN architectures seamlessly with negligible overheads and is end-to-end trainable along with base CNNs. We validate our CBAM through extensive experiments on ImageNet-1K, MS~COCO detection, and VOC~2007 detection datasets. Our experiments show consistent improvements in classification and detection performances with various models, demonstrating the wide applicability of CBAM. The code and models will be publicly available.

CBAM: Convolutional Block Attention Module

TL;DR

This paper introduces CBAM, a lightweight plug-in attention module for CNNs that refines intermediate feature maps via sequential channel and spatial attention. Channel attention uses both average- and max-pooled descriptors with a shared MLP to produce a channel map, while spatial attention uses channel-pooled descriptors and a 7x7 convolution to produce a spatial map. The authors demonstrate strong, consistent improvements across ImageNet-1K, MS COCO, and VOC 2007 over many backbones with negligible overhead, and provide Grad-CAM visualizations showing better localization. Overall, CBAM offers a general, efficient mechanism to boost representation power in CNNs, supporting broader adoption as a modular component.

Abstract

We propose Convolutional Block Attention Module (CBAM), a simple yet effective attention module for feed-forward convolutional neural networks. Given an intermediate feature map, our module sequentially infers attention maps along two separate dimensions, channel and spatial, then the attention maps are multiplied to the input feature map for adaptive feature refinement. Because CBAM is a lightweight and general module, it can be integrated into any CNN architectures seamlessly with negligible overheads and is end-to-end trainable along with base CNNs. We validate our CBAM through extensive experiments on ImageNet-1K, MS~COCO detection, and VOC~2007 detection datasets. Our experiments show consistent improvements in classification and detection performances with various models, demonstrating the wide applicability of CBAM. The code and models will be publicly available.

Paper Structure

This paper contains 19 sections, 3 equations, 5 figures, 7 tables.

Figures (5)

  • Figure 1: The overview of CBAM. The module has two sequential sub-modules: channel and spatial. The intermediate feature map is adaptively refined through our module (CBAM) at every convolutional block of deep networks.
  • Figure 2: Diagram of each attention sub-module. As illustrated, the channel sub-module utilizes both max-pooling outputs and average-pooling outputs with a shared network; the spatial sub-module utilizes similar two outputs that are pooled along the channel axis and forward them to a convolution layer.
  • Figure 3: CBAM integrated with a ResBlock in ResNethe2016deep. This figure shows the exact position of our module when integrated within a ResBlock. We apply CBAM on the convolution outputs in each block.
  • Figure 4: Error curves during ImageNet-1K training. Best viewed in color.
  • Figure 5: Grad-CAM selvaraju2017grad visualization results. We compare the visualization results of CBAM-integrated network (ResNet50 + CBAM) with baseline (ResNet50) and SE-integrated network (ResNet50 + SE). The grad-CAM visualization is calculated for the last convolutional outputs. The ground-truth label is shown on the top of each input image and P denotes the softmax score of each network for the ground-truth class.