Provable wavelet-based neural approximation
Youngmi Hur, Hyojae Lim, Mikyoung Lim
TL;DR
This work develops a provable wavelet-frame framework for neural network approximation on spaces of homogeneous type, linking activation functions to wavelet averaging kernels to obtain $L^2$-error bounds for approximations of functions in the $\mathcal{L}_1$ class. By requiring activation functions to satisfy certain decay and symmetry properties, the authors show that the induced wavelet system forms a frame, enabling $N$-term approximations with error decaying like $(N+1)^{-1/2}$; these approximants can be realized by a $W\vec{B}$-Net with $2N$ hidden nodes. They further extend the theory to oscillatory activations and, crucially, to non-smooth activations through an $L^2$-distance control between activations, providing corollaries that quantify the trade-off between smoothness and approximation accuracy. The results broaden the design space for neural architectures by providing concrete, provable convergence guarantees for a wider class of activation functions, with practical guidance on preserving network structure while handling non-smoothness. Overall, this work strengthens the theoretical foundation for wavelet-based neural approximation and informs architecture choices for problems with oscillatory or non-smooth activation behavior.
Abstract
In this paper, we develop a wavelet-based theoretical framework for analyzing the universal approximation capabilities of neural networks over a wide range of activation functions. Leveraging wavelet frame theory on the spaces of homogeneous type, we derive sufficient conditions on activation functions to ensure that the associated neural network approximates any functions in the given space, along with an error estimate. These sufficient conditions accommodate a variety of smooth activation functions, including those that exhibit oscillatory behavior. Furthermore, by considering the $L^2$-distance between smooth and non-smooth activation functions, we establish a generalized approximation result that is applicable to non-smooth activations, with the error explicitly controlled by this distance. This provides increased flexibility in the design of network architectures.
