Posterior Contraction for Sparse Neural Networks in Besov Spaces with Intrinsic Dimensionality
Kyeongwon Lee, Lizhen Lin, Jaewoo Park, Seonghyun Jeong
TL;DR
This work studies high-dimensional nonparametric regression with functions that lie in anisotropic or composite Besov spaces, capturing intrinsic dimensionality to defeat the curse of dimensionality. It proves that sparse Bayesian neural networks with either spike-and-slab or continuous shrinkage priors achieve near-minimax posterior contraction rates that depend on the intrinsic smoothness, and that these rates adapt to unknown smoothness via data-driven priors on network architecture. The results show the posterior concentrates at $\epsilon_n = n^{-\tilde{s}/(2\tilde{s}+1)}(\log n)^{3/2}$ in the anisotropic case and at $\epsilon_n = n^{-\tilde{s}^*/(2\tilde{s}^*+1)}(\log n)^{3/2}$ in the composite case, with adaptation and applicability to additive/multiplicative Besov structures. These findings provide rigorous theoretical underpinnings for the practical effectiveness of BNNs in high-dimensional, structured estimation tasks and offer guidance for priors that promote sparsity and adaptivity.
Abstract
This work establishes that sparse Bayesian neural networks achieve optimal posterior contraction rates over anisotropic Besov spaces and their hierarchical compositions. These structures reflect the intrinsic dimensionality of the underlying function, thereby mitigating the curse of dimensionality. Our analysis shows that Bayesian neural networks equipped with either sparse or continuous shrinkage priors attain the optimal rates which are dependent on the intrinsic dimension of the true structures. Moreover, we show that these priors enable rate adaptation, allowing the posterior to contract at the optimal rate even when the smoothness level of the true function is unknown. The proposed framework accommodates a broad class of functions, including additive and multiplicative Besov functions as special cases. These results advance the theoretical foundations of Bayesian neural networks and provide rigorous justification for their practical effectiveness in high-dimensional, structured estimation problems.
