Table of Contents
Fetching ...

GmNet: Revisiting Gating Mechanisms From A Frequency View

Yifan Wang, Xu Ma, Yitian Zhang, Zhongruo Wang, Sung-Cheol Kim, Vahid Mirjalili, Vidya Renganathan, Yun Fu

TL;DR

This paper systematically explore the effect of gating mechanisms on the training dynamics of neural networks from a frequency perspective, and proposes a Gating Mechanism Network (GmNet), a lightweight model designed to efficiently utilize the information of various frequency components.

Abstract

Gating mechanisms have emerged as an effective strategy integrated into model designs beyond recurrent neural networks for addressing long-range dependency problems. In a broad understanding, it provides adaptive control over the information flow while maintaining computational efficiency. However, there is a lack of theoretical analysis on how the gating mechanism works in neural networks. In this paper, inspired by the \textit{convolution theorem}, we systematically explore the effect of gating mechanisms on the training dynamics of neural networks from a frequency perspective. We investigate the interact between the element-wise product and activation functions in managing the responses to different frequency components. Leveraging these insights, we propose a Gating Mechanism Network (GmNet), a lightweight model designed to efficiently utilize the information of various frequency components. It minimizes the low-frequency bias present in existing lightweight models. GmNet achieves impressive performance in terms of both effectiveness and efficiency in the image classification task.

GmNet: Revisiting Gating Mechanisms From A Frequency View

TL;DR

This paper systematically explore the effect of gating mechanisms on the training dynamics of neural networks from a frequency perspective, and proposes a Gating Mechanism Network (GmNet), a lightweight model designed to efficiently utilize the information of various frequency components.

Abstract

Gating mechanisms have emerged as an effective strategy integrated into model designs beyond recurrent neural networks for addressing long-range dependency problems. In a broad understanding, it provides adaptive control over the information flow while maintaining computational efficiency. However, there is a lack of theoretical analysis on how the gating mechanism works in neural networks. In this paper, inspired by the \textit{convolution theorem}, we systematically explore the effect of gating mechanisms on the training dynamics of neural networks from a frequency perspective. We investigate the interact between the element-wise product and activation functions in managing the responses to different frequency components. Leveraging these insights, we propose a Gating Mechanism Network (GmNet), a lightweight model designed to efficiently utilize the information of various frequency components. It minimizes the low-frequency bias present in existing lightweight models. GmNet achieves impressive performance in terms of both effectiveness and efficiency in the image classification task.

Paper Structure

This paper contains 33 sections, 3 equations, 17 figures, 15 tables, 2 algorithms.

Figures (17)

  • Figure 1: An illustration of how GLUs affect neural networks in classifying different frequency parts of an image. $\sigma$ means activation function. Starting with a raw image of a 'Tiger cat', we break it down into different frequency bands. The lowest frequency shows a recognizable outline, the higher frequency retains the general shape of the cat, but the highest frequency is almost unrecognizable. Predictions of different components are given in the left of different models. This example demonstrates two points: 1. Although low-frequency decomposed images closely resemble the originals, accurate recognition of it does not guarantee accurate recognition of the original images, and 2. GLUs improve the NNs' ability to learn higher frequency components effectively.
  • Figure 2: Block design of different variants of ResNet18 where $\odot$ represents the element-wise product and $\sigma$ means the activation function.
  • Figure 3: Comparison among Res18, Res18-Ewp, Res18-Gate-ReLU6 and Res18-Gate-GELU. The $r$ represents the threshold of determining the boundary between low-frequency and high-frequency. We plot the learning curves of Resnet18 and its variants for $100$ epochs, together plotted with the accuracy of different frequency components $\mathbf{z}_i$. We set $r$ to $10$. All curves of $\mathbf{z}$ are from the test set. The legends can be found in the top of the figure. We also provide more results with different $r$ and different settings in the appendix.
  • Figure 4: Comparison among different variants of MobileNetv2. Different architectures respond differently to specific frequency component. To ensure an informative comparison, we select representative frequency thresholds tailored to each model where we set $r$ to $10$. Additional results under other threshold configurations and other based models are included in the supplementary material.
  • Figure 5: GmNet architecture. GmNet adopts a traditional hybrid architecture, utilizing convolutional layers to down-sample the resolution and double the number of channels at each stage.
  • ...and 12 more figures