Table of Contents
Fetching ...

Time-Frequency Analysis for Neural Networks

Ahmed Abdeljawad, Elena Cordero

TL;DR

The paper develops a quantitative, phase-space–aware approximation theory for shallow neural networks using modulation spaces and the STFT. By constructing a time-frequency dictionary of windowed activation functions, it proves dimension-independent Sobolev convergence rates of order $N^{-1/2}$ for targets in $M^{p,q}_m$ and extends these results to global domains and various classical spaces (Feichtinger, Shubin, Fourier-Lebesgue, Barron). It also provides a Barron-space specialization, global-domain results with a bounded-shift dictionary, and numerical experiments showing superior Sobolev performance of modulation networks over ReLU baselines. Together, these results connect nonlinear approximation theory, time-frequency analysis, and neural-network design for PDE-related function approximation with explicit, computable constants.

Abstract

We develop a quantitative approximation theory for shallow neural networks using tools from time-frequency analysis. Working in weighted modulation spaces $M^{p,q}_m(\mathbf{R}^{d})$, we prove dimension-independent approximation rates in Sobolev norms $W^{n,r}(Ω)$ for networks whose units combine standard activations with localized time-frequency windows. Our main result shows that for $f \in M^{p,q}_m(\mathbf{R}^{d})$ one can achieve \[ \|f - f_N\|_{W^{n,r}(Ω)} \lesssim N^{-1/2}\,\|f\|_{M^{p,q}_m(\mathbf{R}^{d})}, \] on bounded domains, with explicit control of all constants. We further obtain global approximation theorems on $\mathbf{R}^{d}$ using weighted modulation dictionaries, and derive consequences for Feichtinger's algebra, Fourier-Lebesgue spaces, and Barron spaces. Numerical experiments in one and two dimensions confirm that modulation-based networks achieve substantially better Sobolev approximation than standard ReLU networks, consistent with the theoretical estimates.

Time-Frequency Analysis for Neural Networks

TL;DR

The paper develops a quantitative, phase-space–aware approximation theory for shallow neural networks using modulation spaces and the STFT. By constructing a time-frequency dictionary of windowed activation functions, it proves dimension-independent Sobolev convergence rates of order for targets in and extends these results to global domains and various classical spaces (Feichtinger, Shubin, Fourier-Lebesgue, Barron). It also provides a Barron-space specialization, global-domain results with a bounded-shift dictionary, and numerical experiments showing superior Sobolev performance of modulation networks over ReLU baselines. Together, these results connect nonlinear approximation theory, time-frequency analysis, and neural-network design for PDE-related function approximation with explicit, computable constants.

Abstract

We develop a quantitative approximation theory for shallow neural networks using tools from time-frequency analysis. Working in weighted modulation spaces , we prove dimension-independent approximation rates in Sobolev norms for networks whose units combine standard activations with localized time-frequency windows. Our main result shows that for one can achieve on bounded domains, with explicit control of all constants. We further obtain global approximation theorems on using weighted modulation dictionaries, and derive consequences for Feichtinger's algebra, Fourier-Lebesgue spaces, and Barron spaces. Numerical experiments in one and two dimensions confirm that modulation-based networks achieve substantially better Sobolev approximation than standard ReLU networks, consistent with the theoretical estimates.

Paper Structure

This paper contains 23 sections, 26 theorems, 228 equations, 9 figures.

Key Result

Theorem 3

Let $0 < p_j, q_j \leq \infty$, $s_j, t_j \in \mathbf{R}^{}$, for $j=1,2$, and consider the polynomial weights $v_{t_j}, v_{s_j}$ defined as in weightvs. Then if the following two conditions hold:

Figures (9)

  • Figure 1: Visualizing the tiling of the time-frequency plane. Left: Modulation spaces use a uniform grid. Right: Besov spaces use a dyadic grid where the frequency bandwidth doubles at each scale ($1 \to 2 \to 4$).
  • Figure 2: Visual representation of the admissible regions for the weight indices $s_1$ (left) and $s_2$ (right) as defined in \ref{['eq:indices']}. The solid blue lines indicate the constant values for $p, q \le 1$, while the red shaded areas represent the necessary growth conditions for $p, q > 1$, which depend on the dimension $d$ and derivative order $n$. In this illustration, we set $d=2$ and $n=1$.
  • Figure 3: Training loss over epochs for the modulation and plain ReLU networks (1201 parameters each). Curves show the median over 10 seeds with variability bands.
  • Figure 4: Comparison of plain and modulation model predictions on unseen one-dimensional data using Adam optimizer. The top row displays the predicted values of the target function $e^{-x^{2}}\sin(3x)$, whereas the bottom row displays the predicted values of its derivative.
  • Figure 5: Training loss over epochs for the modulation and plain ReLU networks (1801 parameters each). Curves show the median over 10 seeds with variability bands.
  • ...and 4 more figures

Theorems & Definitions (35)

  • Definition 1: Weighted Fourier-Lebesgue Spaces
  • Definition 2: Barron Norm and Barron Space
  • Theorem 3: Guo18SharpWeightedConvolution
  • Lemma 4
  • Proposition 5
  • Lemma 6: Characterization of Shubin–Sobolev Spaces
  • Theorem 7
  • Definition 8
  • Proposition 9: Approximation Rate in Type-2 Banach Spaces
  • Proposition 10: Siegel23CharacterizationVariationSpaces
  • ...and 25 more