Table of Contents
Fetching ...

Heavy-tailed and Horseshoe priors for regression and sparse Besov rates

Sergios Agapiou, Ismaël Castillo, Paul Egels

TL;DR

This work introduces Oversmoothed heavy-Tailed (OT) priors and Horseshoe priors on wavelet coefficients to achieve adaptive posterior contraction in nonparametric regression across Sobolev and Besov classes, under a range of $L_p$ losses. The authors establish upper contraction bounds showing OT priors attain near-minimax rates in $L_2$ for Sobolev balls and extend these results to Besov spaces with sparse regimes, including sharp lower bounds that demonstrate the necessity of the OT-scale decay and the non-adaptivity of polynomially decaying HT priors in under-smoothing. They provide the first posterior contraction results for Horseshoe priors in this nonparametric context and demonstrate via simulations that OT priors perform competitively with, and sometimes exceed, traditional wavelet-thresholding methods across various signals and losses. The findings highlight heavy-tailed priors as a flexible, computationally tractable approach for adaptive, sparsity-aware function estimation with practical implications for Bayesian nonparametric regression and Besov-rate recovery.

Abstract

The large variety of functions encountered in nonparametric statistics, calls for methods that are flexible enough to achieve optimal or near-optimal performance over a wide variety of functional classes, such as Besov balls, as well as over a large array of loss functions. In this work, we show that a class of heavy-tailed prior distributions on basis function coefficients introduced in \cite{AC} and called Oversmoothed heavy-Tailed (OT) priors, leads to Bayesian posterior distributions that satisfy these requirements; the case of horseshoe distributions is also investigated, for the first time in the context of nonparametrics, and we show that they fit into this framework. Posterior contraction rates are derived in two settings. The case of Sobolev--smooth signals and $L_2$--risk is considered first, along with a lower bound result showing that the imposed form of the scalings on prior coefficients by the OT prior is necessary to get full adaptation to smoothness. Second, the broader case of Besov-smooth signals with $L_{p'}$--risks, $p' \geq 1$, is considered, and minimax posterior contraction rates, adaptive to the underlying smoothness, and including rates in the so-called {\em sparse} zone, are derived. We provide an implementation of the proposed method and illustrate our results through a simulation study.

Heavy-tailed and Horseshoe priors for regression and sparse Besov rates

TL;DR

This work introduces Oversmoothed heavy-Tailed (OT) priors and Horseshoe priors on wavelet coefficients to achieve adaptive posterior contraction in nonparametric regression across Sobolev and Besov classes, under a range of losses. The authors establish upper contraction bounds showing OT priors attain near-minimax rates in for Sobolev balls and extend these results to Besov spaces with sparse regimes, including sharp lower bounds that demonstrate the necessity of the OT-scale decay and the non-adaptivity of polynomially decaying HT priors in under-smoothing. They provide the first posterior contraction results for Horseshoe priors in this nonparametric context and demonstrate via simulations that OT priors perform competitively with, and sometimes exceed, traditional wavelet-thresholding methods across various signals and losses. The findings highlight heavy-tailed priors as a flexible, computationally tractable approach for adaptive, sparsity-aware function estimation with practical implications for Bayesian nonparametric regression and Besov-rate recovery.

Abstract

The large variety of functions encountered in nonparametric statistics, calls for methods that are flexible enough to achieve optimal or near-optimal performance over a wide variety of functional classes, such as Besov balls, as well as over a large array of loss functions. In this work, we show that a class of heavy-tailed prior distributions on basis function coefficients introduced in \cite{AC} and called Oversmoothed heavy-Tailed (OT) priors, leads to Bayesian posterior distributions that satisfy these requirements; the case of horseshoe distributions is also investigated, for the first time in the context of nonparametrics, and we show that they fit into this framework. Posterior contraction rates are derived in two settings. The case of Sobolev--smooth signals and --risk is considered first, along with a lower bound result showing that the imposed form of the scalings on prior coefficients by the OT prior is necessary to get full adaptation to smoothness. Second, the broader case of Besov-smooth signals with --risks, , is considered, and minimax posterior contraction rates, adaptive to the underlying smoothness, and including rates in the so-called {\em sparse} zone, are derived. We provide an implementation of the proposed method and illustrate our results through a simulation study.

Paper Structure

This paper contains 21 sections, 11 theorems, 145 equations, 6 figures, 1 table.

Key Result

Theorem 1

Suppose, in the sequence model def : nseq, that $f_0 \in \mathcal{S}^{\beta}(F)$ and let $\Pi$ be the OT series prior with independent coefficients $(f_k)$ given by def : priork (in particular, the prior def : HSprior is admissible) and scalings given by def : OTprior. As $n \to \infty$, where $\mathcal{L}_n = \log^{\delta}n$, for some $\delta >0$. The same conclusion holds if the prior $\Pi$ is

Figures (6)

  • Figure 1: White noise model: true function (black), posterior means (blue), 95% credible regions (grey), for $n=10^3, 10^4, 10^5$ top to bottom and for the four considered priors left to right.
  • Figure 2: Spatially inhomogeneous true functions.
  • Figure 3: Average errors in $L_{p'}$ for $p'=1,2,3,4,6$, for four model spatially inhomogeneous truths. Signal-to-noise ratio approximately 7 for all truths, errors averaged over 100 realizations of the noise. Errors for posterior means on left, contraction-type errors on right, for Cauchy OT prior (blue plus sign markers) and Gaussian hierarchical prior (red cross markers). In the left plot, the black circles are Hybrid SureShrink estimation errors
  • Figure 4: 'Least favourable' truths $f_0^{(i)}, \,i=1,\dots,4$ of unit $B^{3/2}_{1\infty}$-norm, constructed to have non-zero wavelet coefficients only at level $j=2i$, and with nonzero coefficients constructed via randomly permuted strick-breaking, scaled by $2^{-2i}$.
  • Figure 5: Log average errors for four 'least favourable' truths from Figure \ref{['fig-badtruths']}, with corresponding $n=10^{i+1}$. $L_{p'}$-errors for the posterior mean for Cauchy OT priors (left) and for the hybrid SureShrink estimator (right), for $p'=1,2,3,4,6,\infty$.
  • ...and 1 more figures

Theorems & Definitions (23)

  • Theorem 1
  • Remark 1
  • Theorem 2
  • Remark 2
  • Theorem 3
  • Remark 3
  • Remark 4
  • Lemma 1
  • proof
  • Lemma 2
  • ...and 13 more