DC is all you need: describing ReLU from a signal processing standpoint

Christodoulos Kechris; Jonathan Dan; Jose Miranda; David Atienza

DC is all you need: describing ReLU from a signal processing standpoint

Christodoulos Kechris, Jonathan Dan, Jose Miranda, David Atienza

TL;DR

This work provides an exact Fourier-domain description of the ReLU activation by expanding $\sqrt{1+g(t)}$ and decomposing the output into a preserved input spectrum plus higher-frequency and DC components. The DC term is shown to depend on input amplitudes and to be extractable via global average pooling, offering a concrete mechanism by which ReLU influences feature extraction and learning dynamics. Through synthetic simulations and real CNN analyses, the authors demonstrate exponential convergence of the ReLU expansion and that DC cues can steer optimization toward weight configurations near the random initialization. The findings offer a spectral perspective on activation functions, informing architectural choices and highlighting the practical role of DC components in feature learning.

Abstract

Non-linear activation functions are crucial in Convolutional Neural Networks. However, until now they have not been well described in the frequency domain. In this work, we study the spectral behavior of ReLU, a popular activation function. We use the ReLU's Taylor expansion to derive its frequency domain behavior. We demonstrate that ReLU introduces higher frequency oscillations in the signal and a constant DC component. Furthermore, we investigate the importance of this DC component, where we demonstrate that it helps the model extract meaningful features related to the input frequency content. We accompany our theoretical derivations with experiments and real-world examples. First, we numerically validate our frequency response model. Then we observe ReLU's spectral behavior on two example models and a real-world one. Finally, we experimentally investigate the role of the DC component introduced by ReLU in the CNN's representations. Our results indicate that the DC helps to converge to a weight configuration that is close to the initial random weights.

DC is all you need: describing ReLU from a signal processing standpoint

TL;DR

This work provides an exact Fourier-domain description of the ReLU activation by expanding

and decomposing the output into a preserved input spectrum plus higher-frequency and DC components. The DC term is shown to depend on input amplitudes and to be extractable via global average pooling, offering a concrete mechanism by which ReLU influences feature extraction and learning dynamics. Through synthetic simulations and real CNN analyses, the authors demonstrate exponential convergence of the ReLU expansion and that DC cues can steer optimization toward weight configurations near the random initialization. The findings offer a spectral perspective on activation functions, informing architectural choices and highlighting the practical role of DC components in feature learning.

Abstract

Paper Structure (17 sections, 30 equations, 5 figures)

This paper contains 17 sections, 30 equations, 5 figures.

Introduction
Problem Formulation
ReLU in the Frequency Domain
DC Component as a Feature Extractor
Experiments
ReLU Approximation Simulations
Real-World CNNs
DC Components Simplify Feature Extraction
A Minimum (almost) Zero-Training CNN
Conclusion
DC Terms
$\boldsymbol{k = 0}$
$\boldsymbol{k = 1}$
$\boldsymbol{k = 2}$
Exponential sum convergence
...and 2 more sections

Figures (5)

Figure 1: Time (left) and frequency (right) domain representations of the input signal (blue), ReLU (green) and ReLU approximation (orange), eq. \ref{['eq:sqrt_sum_expansion']}. For this signal the first 100 terms of eq. \ref{['eq:relu_sqrt_sum_expansion']} are sufficient for a good approximation (0.69 Relative Root Mean Squared Error).
Figure 2: Frequency domain of the input signal (blue) and the outputs of the networks $h_{dif}$ (orange) and $h_{avg}$ (green). Differentiation maintains the same spectral content as its input, leading to oscillations throughout the entire frequency range due to the ReLU operations. In contrast, low-passing filters the higher frequencies introduced by the ReLU leading to a frequency-bound output signal.
Figure 3: Frequency content of the activations (green) from the third (top) and sixth (bottom) convolutional layers of kechris2024kid. The frequency components of the periodic heart signal are presented in blue, while the heart rate is indicated by an orange circle. For each layer, we plot the first 16 filters. The ReLUs introduce DC components and higher frequencies, multiples of the heart rate in the first three convolution layers. After the third, an average pooling operation reduces the available bandwidth, removing the higher frequencies. The DC component remains.
Figure 4: Training loss (left) and weight distances for the first layer (middle) and second layer (right) during training of the three CNNs: $h_{relu}$ (blue), $h_{linear}$ (orange) and $h_{linear_{DC}}$ (green).
Figure 5: Left: Frequency response of the randomly initialized weights of the convolution. The frequencies for each class are also plotted. Each frequency corresponds to a different $b_i$, hence the initial convolution weights are good enough to classify the signals based on their frequency content. Right: Network output (DC) vs the input frequency for each of the three classes of signals. Each class is portrayed with a different color.

DC is all you need: describing ReLU from a signal processing standpoint

TL;DR

Abstract

DC is all you need: describing ReLU from a signal processing standpoint

Authors

TL;DR

Abstract

Table of Contents

Figures (5)