MCA: Moment Channel Attention Networks

Yangbo Jiang; Zhiwei Jiang; Le Han; Zenan Huang; Nenggan Zheng

MCA: Moment Channel Attention Networks

Yangbo Jiang, Zhiwei Jiang, Le Han, Zenan Huang, Nenggan Zheng

TL;DR

This paper proposes the Moment Channel Attention (MCA) framework, which efficiently incorporates multiple levels of moment-based information while minimizing additional computation costs through the authors' Cross Moment Convolution (CMC) module.

Abstract

Channel attention mechanisms endeavor to recalibrate channel weights to enhance representation abilities of networks. However, mainstream methods often rely solely on global average pooling as the feature squeezer, which significantly limits the overall potential of models. In this paper, we investigate the statistical moments of feature maps within a neural network. Our findings highlight the critical role of high-order moments in enhancing model capacity. Consequently, we introduce a flexible and comprehensive mechanism termed Extensive Moment Aggregation (EMA) to capture the global spatial context. Building upon this mechanism, we propose the Moment Channel Attention (MCA) framework, which efficiently incorporates multiple levels of moment-based information while minimizing additional computation costs through our Cross Moment Convolution (CMC) module. The CMC module via channel-wise convolution layer to capture multiple order moment information as well as cross channel features. The MCA block is designed to be lightweight and easily integrated into a variety of neural network architectures. Experimental results on classical image classification, object detection, and instance segmentation tasks demonstrate that our proposed method achieves state-of-the-art results, outperforming existing channel attention methods.

MCA: Moment Channel Attention Networks

TL;DR

Abstract

Paper Structure (20 sections, 1 theorem, 10 equations, 3 figures, 7 tables)

This paper contains 20 sections, 1 theorem, 10 equations, 3 figures, 7 tables.

Introduction
Related Work
Attention Mechanisms
Moment Statistics for Deep Learning
The Proposed Approach
Extensive Moment Aggregation Mechanism
Moment Channel Attention Block
Experiments
Implementation Details
Ablation Study
Object Detection on COCO Dataset
Instance Segmentation on COCO Dataset
Image Classification on ImageNet Dataset
ResNet
ShufflenetV2
...and 5 more sections

Key Result

Proposition 1

Let $X$ be bounded random vector with probability distribution $p$ on compact interval $[a, b]^{N}$. Then, for all positive integers $k$, where $M_{k}(X) = E((x - E(X))^{k})$ is the vector of all $k$-th order moments of the marginal distributions of $p$. And $\alpha_{k} \in (0, 1)$ act as weighted parameters.

Figures (3)

Figure 1: The activation of feature map exhibits a distinct probability distribution, as illustrated in (a) and (b). While the first-order moment is inadequate to represent a standard Gaussian distribution in (c), and combining the first and second-order moment falls short in capturing non-Gaussian distributions (as depicted in (d)). Extensive moment aggregation mechanism offers a viable solution to this challenge.
Figure 2: Diagram of the moment channel attention (MCA) networks. We design a extensive moment aggregation (EMA) mechanism to capture global spatial feature, while the cross moment convolution (CMC) method facilitates cross-channel interactions between lower-order and other-order moments as well as different channels. The parameter $\alpha$ denotes as a learnable factor for each moment and $\sigma$ denotes Sigmoid function. $K$ indicates the order of moments, for MCA-E and MCA-S, $K$ is set to 2.
Figure 3: Results of MCA-E and MCA-S with various kernel size $k$ for 3 to 11 using ResNet-50 as backbone model. Here we choose the ECA method as the baseline.

Theorems & Definitions (3)

Definition 1: Extensive Moment Aggregation
Proposition 1: Upper Bound
Proof 1

MCA: Moment Channel Attention Networks

TL;DR

Abstract

MCA: Moment Channel Attention Networks

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (3)

Theorems & Definitions (3)