Misclassification excess risk bounds for PAC-Bayesian classification via convexified loss
The Tien Mai
TL;DR
The paper addresses the problem of bounding misclassification excess risk for PAC-Bayesian classification when using a convex surrogate loss. It develops a PAC-Bayes relative bound in expectation under a low-noise/margin condition, showing that with the Gibbs posterior and a calibrated temperature, the misclassification risk can be controlled by a combination of the convex-loss excess and a KL-divergence penalty, yielding fast n^{-1} rates. The framework is illustrated through two nontrivial applications: high-dimensional sparse classification with a sparsity-promoting prior achieving s^* log(d/s^*)/n bounds, and 1-bit matrix completion with a low-rank prior achieving r(d_1+d_2)/n bounds (up to log factors), both minimax-optimal. These results extend PAC-Bayesian theory beyond risk bounds for convexified losses to direct misclassification risk, providing practical guarantees for convex PAC-Bayesian methods in classification tasks with large or structured parameter spaces.
Abstract
PAC-Bayesian bounds have proven to be a valuable tool for deriving generalization bounds and for designing new learning algorithms in machine learning. However, it typically focus on providing generalization bounds with respect to a chosen loss function. In classification tasks, due to the non-convex nature of the 0-1 loss, a convex surrogate loss is often used, and thus current PAC-Bayesian bounds are primarily specified for this convex surrogate. This work shifts its focus to providing misclassification excess risk bounds for PAC-Bayesian classification when using a convex surrogate loss. Our key ingredient here is to leverage PAC-Bayesian relative bounds in expectation rather than relying on PAC-Bayesian bounds in probability. We demonstrate our approach in several important applications.
