WaveNets: Wavelet Channel Attention Networks

Hadi Salman; Caleb Parks; Shi Yin Hong; Justin Zhan

WaveNets: Wavelet Channel Attention Networks

Hadi Salman, Caleb Parks, Shi Yin Hong, Justin Zhan

TL;DR

The paper tackles information loss in channel attention caused by Global Average Pooling ($GAP$) and introduces WaveNet, a wavelet-based channel compression framework that preserves richer inter-channel information. It proves that $GAP$ is equivalent to a recurrent Haar wavelet approximation, enabling a principled generalization of channel attention via discrete wavelet transforms, and extends this with WaveNet-C, which uses orthogonal, linearly independent wavelet filters to diversify the captured features. The authors validate their approach on ImageNet with a ResNet-34 backbone, showing state-of-the-art or competitive performance with negligible parameter and compute overhead, and emphasize the practicality of integrating WaveNet with existing CA methods via minimal code changes. The work provides a theoretical link between CA and wavelet compression, offers a concrete, easily adoptable enhancement to attention mechanisms, and points to broader applicability in segmentation and detection tasks with larger networks in future work.

Abstract

Channel Attention reigns supreme as an effective technique in the field of computer vision. However, the proposed channel attention by SENet suffers from information loss in feature learning caused by the use of Global Average Pooling (GAP) to represent channels as scalars. Thus, designing effective channel attention mechanisms requires finding a solution to enhance features preservation in modeling channel inter-dependencies. In this work, we utilize Wavelet transform compression as a solution to the channel representation problem. We first test wavelet transform as an Auto-Encoder model equipped with conventional channel attention module. Next, we test wavelet transform as a standalone channel compression method. We prove that global average pooling is equivalent to the recursive approximate Haar wavelet transform. With this proof, we generalize channel attention using Wavelet compression and name it WaveNet. Implementation of our method can be embedded within existing channel attention methods with a couple of lines of code. We test our proposed method using ImageNet dataset for image classification task. Our method outperforms the baseline SENet, and achieves the state-of-the-art results. Our code implementation is publicly available at https://github.com/hady1011/WaveNet-C.

WaveNets: Wavelet Channel Attention Networks

TL;DR

The paper tackles information loss in channel attention caused by Global Average Pooling (

) and introduces WaveNet, a wavelet-based channel compression framework that preserves richer inter-channel information. It proves that

is equivalent to a recurrent Haar wavelet approximation, enabling a principled generalization of channel attention via discrete wavelet transforms, and extends this with WaveNet-C, which uses orthogonal, linearly independent wavelet filters to diversify the captured features. The authors validate their approach on ImageNet with a ResNet-34 backbone, showing state-of-the-art or competitive performance with negligible parameter and compute overhead, and emphasize the practicality of integrating WaveNet with existing CA methods via minimal code changes. The work provides a theoretical link between CA and wavelet compression, offers a concrete, easily adoptable enhancement to attention mechanisms, and points to broader applicability in segmentation and detection tasks with larger networks in future work.

Abstract

Paper Structure (20 sections, 1 theorem, 12 equations, 1 figure, 1 table)

This paper contains 20 sections, 1 theorem, 12 equations, 1 figure, 1 table.

Introduction
Related Work
Visual Attention in CNNs
Wavelet Transforms in Image Processing
Method
Discrete Wavelet Transform (DWT) and Channel Attention (CA)
DWT using Multiplication
DWT using Convolution
Channel Attention
Interdependent Channel Attention
Wavelet Channel Attention
Orthogonal Linearly Independent Channel Attention Module
Wavelet Filter Choice
Experiments
Implementation Details
...and 5 more sections

Key Result

Theorem 1

For an image X with the size of $H \times W$, GAP is an exceptional case of 2D DWT with result proportional to the $\log_2{(\max{(H,W)})}$ level approximation using 2D Discrete Haar Wavelet Transform (DHWT).

Figures (1)

Figure 1: Illustration of existing channel attention and Orthogonal Interdependent channel attention. The 2D DWT are initialized randomly then Orthogonalized using Gram-Schmidt process. We can see that our method uses a variety of filters, while SENet only uses GAP in channel attention. Best viewed in color.

Theorems & Definitions (2)

Theorem 1
proof

WaveNets: Wavelet Channel Attention Networks

TL;DR

Abstract

WaveNets: Wavelet Channel Attention Networks

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (1)

Theorems & Definitions (2)