Table of Contents
Fetching ...

Hierarchical Besov-Laplace priors for spatially inhomogeneous binary classification

Patric Dolmeta, Matteo Giordano

TL;DR

The paper tackles nonparametric Bayesian binary classification when the probability surface $p: [0,1]^d\to[0,1]$ may be spatially inhomogeneous. It introduces a hierarchical Besov-Laplace prior with a hyper-prior on the regularity parameter to adapt to unknown smoothness and maps it through a logistic link to model $p$, enabling edge-preserving posterior inference. The main theoretical result establishes adaptive minimax-optimal $L^1$ contraction rates $\|p-p_0\|_1 \lesssim n^{-\alpha_0/(2\alpha_0+d)}$ for ground truths $p_0\in B^{\alpha_0}_1([0,1]^d)$ with $\alpha_0>d$, without knowledge of $\alpha_0$, complemented by an efficient wpCN-based MCMC strategy for posterior sampling. Empirical simulations in 1D and 2D confirm the method's ability to recover spatially inhomogeneous features and demonstrate clear advantages over Gaussian priors in edge-rich settings, highlighting the practical relevance for spatially resolved classification problems and uncertainty quantification.

Abstract

We study nonparametric Bayesian binary classification, in the case where the unknown probability response function is possibly spatially inhomogeneous, for example, being generally flat across the domain but presenting localized sharp variations. We consider a hierarchical procedure based on the popular Besov-Laplace priors from inverse problems and imaging, with a carefully tuned hyper-prior on the regularity parameter. We show that the resulting posterior distribution concentrates towards the ground truth at optimal rate, automatically adapting to the unknown regularity. To implement posterior inference in practice, we devise an efficient Markov chain Monte Carlo (MCMC) algorithm based on recent ad-hoc dimension-robust methods for Besov-Laplace priors. We then test the considered approach in extensive numerical simulations, where we obtain a solid corroboration of the theoretical results.

Hierarchical Besov-Laplace priors for spatially inhomogeneous binary classification

TL;DR

The paper tackles nonparametric Bayesian binary classification when the probability surface may be spatially inhomogeneous. It introduces a hierarchical Besov-Laplace prior with a hyper-prior on the regularity parameter to adapt to unknown smoothness and maps it through a logistic link to model , enabling edge-preserving posterior inference. The main theoretical result establishes adaptive minimax-optimal contraction rates for ground truths with , without knowledge of , complemented by an efficient wpCN-based MCMC strategy for posterior sampling. Empirical simulations in 1D and 2D confirm the method's ability to recover spatially inhomogeneous features and demonstrate clear advantages over Gaussian priors in edge-rich settings, highlighting the practical relevance for spatially resolved classification problems and uncertainty quantification.

Abstract

We study nonparametric Bayesian binary classification, in the case where the unknown probability response function is possibly spatially inhomogeneous, for example, being generally flat across the domain but presenting localized sharp variations. We consider a hierarchical procedure based on the popular Besov-Laplace priors from inverse problems and imaging, with a carefully tuned hyper-prior on the regularity parameter. We show that the resulting posterior distribution concentrates towards the ground truth at optimal rate, automatically adapting to the unknown regularity. To implement posterior inference in practice, we devise an efficient Markov chain Monte Carlo (MCMC) algorithm based on recent ad-hoc dimension-robust methods for Besov-Laplace priors. We then test the considered approach in extensive numerical simulations, where we obtain a solid corroboration of the theoretical results.

Paper Structure

This paper contains 13 sections, 3 theorems, 70 equations, 4 figures, 4 tables.

Key Result

Theorem 1

Let $\Pi_n$ be a hierarchical rescaled Besov-Laplace prior for probability response functions constructed as in Section Subsec:Prior. Let $D^{(n)} = \{(Y_i, X_i)\}^n_{i=1}\sim Q_{p_0}^{(n)}$ be a random sample of labeled binary classification data arising from model Eq:Model for some fixed $p = p_0

Figures (4)

  • Figure 1: Left to right: Posterior means for Gaussian (solid green) and Besov-Laplace (solid blue) priors, pointwise $95\%$-credible intervals (shaded regions) for $n = 200, 1000, 5000$, respectively. The ground truth $p_0$ from \ref{['Eq:1DTruth']} is shown in solid black. Rugs at the bottom represent the covariate values labeled $0$, while rugs at the top represent covariates labeled $1$.
  • Figure 2: Left to right: Posterior means for Gaussian (solid green) and Laplace (solid blue) priors, pointwise $95\%$-credible intervals (shaded regions) for $n = 200, 1000, 5000$, respectively. The ground truth $p_0$ from \ref{['Eq:1D_block']} is shown in solid black.
  • Figure 3: Top to bottom, left to right: posterior means for Gaussian (top) and Besov-Laplace (bottom) priors for increasing sample sizes $n = 200,1000,5000$, in case of the spatially homogeneous ground truth \ref{['Eq:2D_skn']}, shown in the rightmost panel of both rows.
  • Figure 4: Top to bottom, left to right: posterior means for Gaussian (top) and Besov-Laplace (bottom) priors for increasing sample sizes $n = 200,1000,5000$, in case of the spatially homogeneous ground truth \ref{['Eq:2D_bs']}, shown in the rightmost panel of both rows.

Theorems & Definitions (7)

  • Remark 1: Rescaling
  • Theorem 1
  • Remark 2: Boundedness away from zero
  • Lemma 2
  • proof
  • Lemma 3
  • proof