Table of Contents
Fetching ...

SGDM: Static-Guided Dynamic Module Make Stronger Visual Models

Wenjie Xing, Zhenchao Cui, Jing Qi

TL;DR

The Static-Guided Dynamic Module (SGDM), a set of asymmetric static convolution kernel parameters to guide the construction of dynamic convolution, is proposed to address the two flaws in dynamic weight convolution.

Abstract

The spatial attention mechanism has been widely used to improve object detection performance. However, its operation is currently limited to static convolutions lacking content-adaptive features. This paper innovatively approaches from the perspective of dynamic convolution. We propose Razor Dynamic Convolution (RDConv) to address thetwo flaws in dynamic weight convolution, making it hard to implement in spatial mechanism: 1) it is computation-heavy; 2) when generating weights, spatial information is disregarded. Firstly, by using Razor Operation to generate certain features, we vastly reduce the parameters of the entire dynamic convolution operation. Secondly, we added a spatial branch inside RDConv to generate convolutional kernel parameters with richer spatial information. Embedding dynamic convolution will also bring the problem of sensitivity to high-frequency noise. We propose the Static-Guided Dynamic Module (SGDM) to address this limitation. By using SGDM, we utilize a set of asymmetric static convolution kernel parameters to guide the construction of dynamic convolution. We introduce the mechanism of shared weights in static convolution to solve the problem of dynamic convolution being sensitive to high-frequency noise. Extensive experiments illustrate that multiple different object detection backbones equipped with SGDM achieve a highly competitive boost in performance(e.g., +4% mAP with YOLOv5n on VOC and +1.7% mAP with YOLOv8n on COCO) with negligible parameter increase(i.e., +0.33M on YOLOv5n and +0.19M on YOLOv8n).

SGDM: Static-Guided Dynamic Module Make Stronger Visual Models

TL;DR

The Static-Guided Dynamic Module (SGDM), a set of asymmetric static convolution kernel parameters to guide the construction of dynamic convolution, is proposed to address the two flaws in dynamic weight convolution.

Abstract

The spatial attention mechanism has been widely used to improve object detection performance. However, its operation is currently limited to static convolutions lacking content-adaptive features. This paper innovatively approaches from the perspective of dynamic convolution. We propose Razor Dynamic Convolution (RDConv) to address thetwo flaws in dynamic weight convolution, making it hard to implement in spatial mechanism: 1) it is computation-heavy; 2) when generating weights, spatial information is disregarded. Firstly, by using Razor Operation to generate certain features, we vastly reduce the parameters of the entire dynamic convolution operation. Secondly, we added a spatial branch inside RDConv to generate convolutional kernel parameters with richer spatial information. Embedding dynamic convolution will also bring the problem of sensitivity to high-frequency noise. We propose the Static-Guided Dynamic Module (SGDM) to address this limitation. By using SGDM, we utilize a set of asymmetric static convolution kernel parameters to guide the construction of dynamic convolution. We introduce the mechanism of shared weights in static convolution to solve the problem of dynamic convolution being sensitive to high-frequency noise. Extensive experiments illustrate that multiple different object detection backbones equipped with SGDM achieve a highly competitive boost in performance(e.g., +4% mAP with YOLOv5n on VOC and +1.7% mAP with YOLOv8n on COCO) with negligible parameter increase(i.e., +0.33M on YOLOv5n and +0.19M on YOLOv8n).
Paper Structure (19 sections, 10 equations, 6 figures, 6 tables)

This paper contains 19 sections, 10 equations, 6 figures, 6 tables.

Figures (6)

  • Figure 1: Illustration of typical dynamic convolution and our RDConv. The former calculates dynamic weights by all features, while the latter only operates on intrinsic features.
  • Figure 2: Illustration of the composition of RDConv and the design of SGDM module. RDConv effectively solves some problems of dynamic convolution through Razor Operation and Spatial Branch. The SGDM module is a plug-and-play module while applying RDConv, achieving seamless performance improvement for the visual models.
  • Figure 3: The loss curve during the training process of YOLOv8n is embedded with four different attention mechanisms.
  • Figure 4: Visualization of detection results for images with Gaussian noise using YOLOv8n model with and without SGDM.
  • Figure 5: Results comparison of different values of $k$ in spatial branch. The best results are bolded.
  • ...and 1 more figures