Composite Quantile Regression With XGBoost Using the Novel Arctan Pinball Loss
Laurens Sluijterman, Frank Kreuwel, Eric Cator, Tom Heskes
TL;DR
The paper tackles the challenge of performing composite quantile regression with XGBoost, where the traditional pinball loss is ill-suited for the model's second-order optimization due to nondifferentiability and vanishing curvature. It introduces the arctan pinball loss, a smooth alternative with a non-vanishing second derivative, enabling a single XGBoost model to predict multiple quantiles simultaneously and reducing quantile crossings. The authors derive theoretical properties and provide practical recommendations for hyperparameters and model setup. Empirical results on toy, UCI benchmark, and electricity-grid substations data demonstrate competitive coverage with substantially fewer crossings and highlight the method's scalability and efficiency, along with considerations for calibration and extrapolation. A public implementation accompanies the work to facilitate adoption in uncertainty quantification and risk-aware forecasting tasks.
Abstract
This paper explores the use of XGBoost for composite quantile regression. XGBoost is a highly popular model renowned for its flexibility, efficiency, and capability to deal with missing data. The optimization uses a second order approximation of the loss function, complicating the use of loss functions with a zero or vanishing second derivative. Quantile regression -- a popular approach to obtain conditional quantiles when point estimates alone are insufficient -- unfortunately uses such a loss function, the pinball loss. Existing workarounds are typically inefficient and can result in severe quantile crossings. In this paper, we present a smooth approximation of the pinball loss, the arctan pinball loss, that is tailored to the needs of XGBoost. Specifically, contrary to other smooth approximations, the arctan pinball loss has a relatively large second derivative, which makes it more suitable to use in the second order approximation. Using this loss function enables the simultaneous prediction of multiple quantiles, which is more efficient and results in far fewer quantile crossings.
