Table of Contents
Fetching ...

FMDConv: Fast Multi-Attention Dynamic Convolution via Speed-Accuracy Trade-off

Tianyu Zhang, Fan Wan, Haoran Duan, Kevin W. Tong, Jingjing Deng, Yang Long

TL;DR

FMDConv tackles the speed-accuracy trade-off in dynamic convolution by introducing a lightweight block with input, temperature-degraded kernel, and output attentions, plus two standardized metrics IES and RCS. The method reduces FLOPs significantly on ResNet backbones while maintaining competitive accuracy across CIFAR-10/100 and ImageNet, outperforming CondConv, DynamicConv, and ODConv in efficiency-accuracy balance. The paper provides extensive experiments, ablations, and a temperature scheduling strategy to improve convergence. The contributions enable practical deployment of dynamic convolution in resource-constrained environments and advocate standardized evaluation for speed-accuracy.

Abstract

Spatial convolution is fundamental in constructing deep Convolutional Neural Networks (CNNs) for visual recognition. While dynamic convolution enhances model accuracy by adaptively combining static kernels, it incurs significant computational overhead, limiting its deployment in resource-constrained environments such as federated edge computing. To address this, we propose Fast Multi-Attention Dynamic Convolution (FMDConv), which integrates input attention, temperature-degraded kernel attention, and output attention to optimize the speed-accuracy trade-off. FMDConv achieves a better balance between accuracy and efficiency by selectively enhancing feature extraction with lower complexity. Furthermore, we introduce two novel quantitative metrics, the Inverse Efficiency Score and Rate-Correct Score, to systematically evaluate this trade-off. Extensive experiments on CIFAR-10, CIFAR-100, and ImageNet demonstrate that FMDConv reduces the computational cost by up to 49.8\% on ResNet-18 and 42.2\% on ResNet-50 compared to prior multi-attention dynamic convolution methods while maintaining competitive accuracy. These advantages make FMDConv highly suitable for real-world, resource-constrained applications.

FMDConv: Fast Multi-Attention Dynamic Convolution via Speed-Accuracy Trade-off

TL;DR

FMDConv tackles the speed-accuracy trade-off in dynamic convolution by introducing a lightweight block with input, temperature-degraded kernel, and output attentions, plus two standardized metrics IES and RCS. The method reduces FLOPs significantly on ResNet backbones while maintaining competitive accuracy across CIFAR-10/100 and ImageNet, outperforming CondConv, DynamicConv, and ODConv in efficiency-accuracy balance. The paper provides extensive experiments, ablations, and a temperature scheduling strategy to improve convergence. The contributions enable practical deployment of dynamic convolution in resource-constrained environments and advocate standardized evaluation for speed-accuracy.

Abstract

Spatial convolution is fundamental in constructing deep Convolutional Neural Networks (CNNs) for visual recognition. While dynamic convolution enhances model accuracy by adaptively combining static kernels, it incurs significant computational overhead, limiting its deployment in resource-constrained environments such as federated edge computing. To address this, we propose Fast Multi-Attention Dynamic Convolution (FMDConv), which integrates input attention, temperature-degraded kernel attention, and output attention to optimize the speed-accuracy trade-off. FMDConv achieves a better balance between accuracy and efficiency by selectively enhancing feature extraction with lower complexity. Furthermore, we introduce two novel quantitative metrics, the Inverse Efficiency Score and Rate-Correct Score, to systematically evaluate this trade-off. Extensive experiments on CIFAR-10, CIFAR-100, and ImageNet demonstrate that FMDConv reduces the computational cost by up to 49.8\% on ResNet-18 and 42.2\% on ResNet-50 compared to prior multi-attention dynamic convolution methods while maintaining competitive accuracy. These advantages make FMDConv highly suitable for real-world, resource-constrained applications.

Paper Structure

This paper contains 15 sections, 6 equations, 5 figures, 10 tables, 1 algorithm.

Figures (5)

  • Figure 1: Overview of the FMDConv framework. The diagram illustrates the three attention mechanisms (Input, TD Kernel, and Output Attention) in FMDConv, each targeting a distinct stage of feature extraction for optimal efficiency and accuracy.
  • Figure 2: The architecture of the Fast Multi-Attention Dynamic Convolution (FMDConv) block. It integrates three attention mechanisms: Input Attention, Temperature-Degraded (TD) Kernel Attention, and Output Attention. These attentions are computed via Sigmoid and SoftMax functions to adjust feature maps and convolution kernels dynamically.
  • Figure 3: (a) Attentions Comparison on the CIFAR-10; (b) Inverse Efficiency Score (IES); (c) Rate-correct Score (RCS).
  • Figure 4: Grad-CAM++ visualization results for multiple attention mechanisms on ImageNet. (a) Original Images, (b) Feature Maps with Input Attention, (c) Feature Maps with Temperature-Degraded (TD) Kernel Attention, (d) Feature Maps with Output Attention, and (e) Feature Maps with All Combined Attentions.
  • Figure 5: Top-1 and Top-5 accuracy comparison for FMDConv under different initial temperatures on CIFAR-100.