CBAM: Convolutional Block Attention Module
Sanghyun Woo, Jongchan Park, Joon-Young Lee, In So Kweon
TL;DR
This paper introduces CBAM, a lightweight plug-in attention module for CNNs that refines intermediate feature maps via sequential channel and spatial attention. Channel attention uses both average- and max-pooled descriptors with a shared MLP to produce a channel map, while spatial attention uses channel-pooled descriptors and a 7x7 convolution to produce a spatial map. The authors demonstrate strong, consistent improvements across ImageNet-1K, MS COCO, and VOC 2007 over many backbones with negligible overhead, and provide Grad-CAM visualizations showing better localization. Overall, CBAM offers a general, efficient mechanism to boost representation power in CNNs, supporting broader adoption as a modular component.
Abstract
We propose Convolutional Block Attention Module (CBAM), a simple yet effective attention module for feed-forward convolutional neural networks. Given an intermediate feature map, our module sequentially infers attention maps along two separate dimensions, channel and spatial, then the attention maps are multiplied to the input feature map for adaptive feature refinement. Because CBAM is a lightweight and general module, it can be integrated into any CNN architectures seamlessly with negligible overheads and is end-to-end trainable along with base CNNs. We validate our CBAM through extensive experiments on ImageNet-1K, MS~COCO detection, and VOC~2007 detection datasets. Our experiments show consistent improvements in classification and detection performances with various models, demonstrating the wide applicability of CBAM. The code and models will be publicly available.
