Table of Contents
Fetching ...

Posterior Contraction for Sparse Neural Networks in Besov Spaces with Intrinsic Dimensionality

Kyeongwon Lee, Lizhen Lin, Jaewoo Park, Seonghyun Jeong

TL;DR

This work studies high-dimensional nonparametric regression with functions that lie in anisotropic or composite Besov spaces, capturing intrinsic dimensionality to defeat the curse of dimensionality. It proves that sparse Bayesian neural networks with either spike-and-slab or continuous shrinkage priors achieve near-minimax posterior contraction rates that depend on the intrinsic smoothness, and that these rates adapt to unknown smoothness via data-driven priors on network architecture. The results show the posterior concentrates at $\epsilon_n = n^{-\tilde{s}/(2\tilde{s}+1)}(\log n)^{3/2}$ in the anisotropic case and at $\epsilon_n = n^{-\tilde{s}^*/(2\tilde{s}^*+1)}(\log n)^{3/2}$ in the composite case, with adaptation and applicability to additive/multiplicative Besov structures. These findings provide rigorous theoretical underpinnings for the practical effectiveness of BNNs in high-dimensional, structured estimation tasks and offer guidance for priors that promote sparsity and adaptivity.

Abstract

This work establishes that sparse Bayesian neural networks achieve optimal posterior contraction rates over anisotropic Besov spaces and their hierarchical compositions. These structures reflect the intrinsic dimensionality of the underlying function, thereby mitigating the curse of dimensionality. Our analysis shows that Bayesian neural networks equipped with either sparse or continuous shrinkage priors attain the optimal rates which are dependent on the intrinsic dimension of the true structures. Moreover, we show that these priors enable rate adaptation, allowing the posterior to contract at the optimal rate even when the smoothness level of the true function is unknown. The proposed framework accommodates a broad class of functions, including additive and multiplicative Besov functions as special cases. These results advance the theoretical foundations of Bayesian neural networks and provide rigorous justification for their practical effectiveness in high-dimensional, structured estimation problems.

Posterior Contraction for Sparse Neural Networks in Besov Spaces with Intrinsic Dimensionality

TL;DR

This work studies high-dimensional nonparametric regression with functions that lie in anisotropic or composite Besov spaces, capturing intrinsic dimensionality to defeat the curse of dimensionality. It proves that sparse Bayesian neural networks with either spike-and-slab or continuous shrinkage priors achieve near-minimax posterior contraction rates that depend on the intrinsic smoothness, and that these rates adapt to unknown smoothness via data-driven priors on network architecture. The results show the posterior concentrates at in the anisotropic case and at in the composite case, with adaptation and applicability to additive/multiplicative Besov structures. These findings provide rigorous theoretical underpinnings for the practical effectiveness of BNNs in high-dimensional, structured estimation tasks and offer guidance for priors that promote sparsity and adaptivity.

Abstract

This work establishes that sparse Bayesian neural networks achieve optimal posterior contraction rates over anisotropic Besov spaces and their hierarchical compositions. These structures reflect the intrinsic dimensionality of the underlying function, thereby mitigating the curse of dimensionality. Our analysis shows that Bayesian neural networks equipped with either sparse or continuous shrinkage priors attain the optimal rates which are dependent on the intrinsic dimension of the true structures. Moreover, we show that these priors enable rate adaptation, allowing the posterior to contract at the optimal rate even when the smoothness level of the true function is unknown. The proposed framework accommodates a broad class of functions, including additive and multiplicative Besov functions as special cases. These results advance the theoretical foundations of Bayesian neural networks and provide rigorous justification for their practical effectiveness in high-dimensional, structured estimation problems.

Paper Structure

This paper contains 33 sections, 21 theorems, 147 equations, 2 figures, 2 tables.

Key Result

Theorem 3.3

Suppose that Assumptions assum:a-1--assum:a-3 hold, and that the prior distribution in eqn:sparse_prior is placed over $\Theta(L_{1n}, D_{1n}, S_{1n})$. Assume further that the slab density $\tilde{\pi}_{SL}$ satisfies Assumptions cond:ss_support--cond:ss_tail. Then, the posterior distribution conce in $P_{f_0,\sigma_0}^{(n)}$-probability as $n\rightarrow \infty$ for any $M_n \rightarrow \infty$.

Figures (2)

  • Figure 1: We illustrate two example functions, $f_1(x) = I(\{x_1 \in [1/2, 1]\}) + \sin (2\pi x_2)$ and $f_2(x) = \lvert x_1 - 1/2 \rvert + (x_2 - 1/2)^2$, and their rotated counterparts $f_1'$ and $f_2'$.
  • Figure 2: Illustration of additive ($f(x) = \sum_{i=1}^d g_i(x_i)$) and multiplicative ($f(x) = \prod_{i=1}^d g_i(x_i)$) composite Besov functions. Each component function $g_i$ depends on a single input dimension ($t^{(1)} = 1$), although the ambient dimension $d^{(0)}=d$ may be much larger.

Theorems & Definitions (60)

  • Example 3.1: Uniform slab prior lee2022asymptoticpolson2018posterior
  • Example 3.2: Gaussian slab prior
  • Theorem 3.3: Spike-and-slab prior
  • proof
  • Remark 3.4
  • Example 3.5: Relaxed spike-and-slab lee2022asymptotic
  • Example 3.6: Relaxed spike-and-slab; Gaussian slab
  • Remark 3.7
  • Theorem 3.8: Shrinkage prior
  • proof
  • ...and 50 more