Table of Contents
Fetching ...

A sparse PAC-Bayesian approach for high-dimensional quantile prediction

The Tien Mai

TL;DR

A novel probabilistic machine learning approach for high-dimensional quantile prediction that demonstrates strong theoretical guarantees that establish non-asymptotic oracle inequalities, showing minimax-optimal prediction error and adaptability to unknown sparsity.

Abstract

Quantile regression, a robust method for estimating conditional quantiles, has advanced significantly in fields such as econometrics, statistics, and machine learning. In high-dimensional settings, where the number of covariates exceeds sample size, penalized methods like lasso have been developed to address sparsity challenges. Bayesian methods, initially connected to quantile regression via the asymmetric Laplace likelihood, have also evolved, though issues with posterior variance have led to new approaches, including pseudo/score likelihoods. This paper presents a novel probabilistic machine learning approach for high-dimensional quantile prediction. It uses a pseudo-Bayesian framework with a scaled Student-t prior and Langevin Monte Carlo for efficient computation. The method demonstrates strong theoretical guarantees, through PAC-Bayes bounds, that establish non-asymptotic oracle inequalities, showing minimax-optimal prediction error and adaptability to unknown sparsity. Its effectiveness is validated through simulations and real-world data, where it performs competitively against established frequentist and Bayesian techniques.

A sparse PAC-Bayesian approach for high-dimensional quantile prediction

TL;DR

A novel probabilistic machine learning approach for high-dimensional quantile prediction that demonstrates strong theoretical guarantees that establish non-asymptotic oracle inequalities, showing minimax-optimal prediction error and adaptability to unknown sparsity.

Abstract

Quantile regression, a robust method for estimating conditional quantiles, has advanced significantly in fields such as econometrics, statistics, and machine learning. In high-dimensional settings, where the number of covariates exceeds sample size, penalized methods like lasso have been developed to address sparsity challenges. Bayesian methods, initially connected to quantile regression via the asymmetric Laplace likelihood, have also evolved, though issues with posterior variance have led to new approaches, including pseudo/score likelihoods. This paper presents a novel probabilistic machine learning approach for high-dimensional quantile prediction. It uses a pseudo-Bayesian framework with a scaled Student-t prior and Langevin Monte Carlo for efficient computation. The method demonstrates strong theoretical guarantees, through PAC-Bayes bounds, that establish non-asymptotic oracle inequalities, showing minimax-optimal prediction error and adaptability to unknown sparsity. Its effectiveness is validated through simulations and real-world data, where it performs competitively against established frequentist and Bayesian techniques.
Paper Structure (21 sections, 11 theorems, 77 equations, 5 tables)

This paper contains 21 sections, 11 theorems, 77 equations, 5 tables.

Key Result

Theorem 1

Assume that Assumption assume_X_bounded is satisfied and that our loss function is bounded, i.e. $\ell_\tau (y,x) \in [0,C]$. Take $\lambda= \sqrt{n}$, $\varsigma = ( C_{\rm x} n\sqrt{d})^{-1}$. Then for all $\theta^*$ such that $\| \theta^*\|_1 \leq C_1 - 2d\varsigma$ we have that and with probability at least $1-\varepsilon, \varepsilon\in (0,1)$ that for some constant $\mathcal{C}_1 , \mathca

Theorems & Definitions (20)

  • Theorem 1
  • Corollary 1
  • Proposition 1
  • Theorem 2
  • Remark 1
  • Corollary 2
  • Proposition 2
  • Remark 2
  • proof : Proof for Theorem \ref{['thm_main_2']}
  • proof : Proof of Proposition \ref{['propo_slow']}
  • ...and 10 more