Wasserstein Gradient Boosting: A Framework for Distribution-Valued Supervised Learning
Takuo Matsubara
TL;DR
This paper presents Wasserstein gradient boosting (WGBoost), a framework for distribution-valued supervised learning that outputs a particle-based estimate of the input-conditioned distribution $p(\theta|x)$. By marrying Wasserstein gradient flows with gradient boosting, WGBoost iteratively updates $N$ ensembles to produce a nonparametric distribution over $\Theta$ for each input, using a KL divergence loss and kernel-smoothed gradients; a diagonal Newton variant (WEvidential) yields a second-order, computationally efficient approach. The default evidential learning setting uses individual-level posteriors $p(\theta|y_i)$ as outputs, enabling predictive distributions via $p(y|x) = \frac{1}{N}\sum_{n} p(y|\theta^n(x))$ and Bayes actions that minimise average risk. Empirically, WGBoost improves probabilistic forecasts and out-of-distribution detection on real-world tabular data compared with established uncertainty quantification methods.
Abstract
Gradient boosting is a sequential ensemble method that fits a new weaker learner to pseudo residuals at each iteration. We propose Wasserstein gradient boosting, a novel extension of gradient boosting that fits a new weak learner to alternative pseudo residuals that are Wasserstein gradients of loss functionals of probability distributions assigned at each input. It solves distribution-valued supervised learning, where the output values of the training dataset are probability distributions for each input. In classification and regression, a model typically returns, for each input, a point estimate of a parameter of a noise distribution specified for a response variable, such as the class probability parameter of a categorical distribution specified for a response label. A main application of Wasserstein gradient boosting in this paper is tree-based evidential learning, which returns a distributional estimate of the response parameter for each input. We empirically demonstrate the superior performance of the probabilistic prediction by Wasserstein gradient boosting in comparison with existing uncertainty quantification methods.
