Table of Contents
Fetching ...

Multivariate Forecasting of Bitcoin Volatility with Gradient Boosting: Deterministic, Probabilistic, and Feature Importance Perspectives

Grzegorz Dudek, Mateusz Kasprzyk, Paweł Pełka

TL;DR

This paper develops a comprehensive framework for forecasting Bitcoin realized volatility using LightGBM, addressing both deterministic and probabilistic perspectives. It introduces two quantile methods—direct pinball-loss regression and residual-simulation (QRS)—and leverages 69 predictors plus shock indicators to analyze driver importance. Empirical results show LGBM-based forecasts outperform econometric and baseline ML approaches, with QRS-LGBM delivering best probabilistic calibration and sharp prediction intervals; feature importance consistently points to lagged RV, trading volume, Google Trends, and market cap as key drivers. The work offers practical implications for risk management and trading, while highlighting computational efficiency, robustness, and avenues for future enhancements such as SHAP-based attribution and alternative probabilistic frameworks.

Abstract

This study investigates the application of the Light Gradient Boosting Machine (LGBM) model for both deterministic and probabilistic forecasting of Bitcoin realized volatility. Utilizing a comprehensive set of 69 predictors -- encompassing market, behavioral, and macroeconomic indicators -- we evaluate the performance of LGBM-based models and compare them with both econometric and machine learning baselines. For probabilistic forecasting, we explore two quantile-based approaches: direct quantile regression using the pinball loss function, and a residual simulation method that transforms point forecasts into predictive distributions. To identify the main drivers of volatility, we employ gain-based and permutation feature importance techniques, consistently highlighting the significance of trading volume, lagged volatility measures, investor attention, and market capitalization. The results demonstrate that LGBM models effectively capture the nonlinear and high-variance characteristics of cryptocurrency markets while providing interpretable insights into the underlying volatility dynamics.

Multivariate Forecasting of Bitcoin Volatility with Gradient Boosting: Deterministic, Probabilistic, and Feature Importance Perspectives

TL;DR

This paper develops a comprehensive framework for forecasting Bitcoin realized volatility using LightGBM, addressing both deterministic and probabilistic perspectives. It introduces two quantile methods—direct pinball-loss regression and residual-simulation (QRS)—and leverages 69 predictors plus shock indicators to analyze driver importance. Empirical results show LGBM-based forecasts outperform econometric and baseline ML approaches, with QRS-LGBM delivering best probabilistic calibration and sharp prediction intervals; feature importance consistently points to lagged RV, trading volume, Google Trends, and market cap as key drivers. The work offers practical implications for risk management and trading, while highlighting computational efficiency, robustness, and avenues for future enhancements such as SHAP-based attribution and alternative probabilistic frameworks.

Abstract

This study investigates the application of the Light Gradient Boosting Machine (LGBM) model for both deterministic and probabilistic forecasting of Bitcoin realized volatility. Utilizing a comprehensive set of 69 predictors -- encompassing market, behavioral, and macroeconomic indicators -- we evaluate the performance of LGBM-based models and compare them with both econometric and machine learning baselines. For probabilistic forecasting, we explore two quantile-based approaches: direct quantile regression using the pinball loss function, and a residual simulation method that transforms point forecasts into predictive distributions. To identify the main drivers of volatility, we employ gain-based and permutation feature importance techniques, consistently highlighting the significance of trading volume, lagged volatility measures, investor attention, and market capitalization. The results demonstrate that LGBM models effectively capture the nonlinear and high-variance characteristics of cryptocurrency markets while providing interpretable insights into the underlying volatility dynamics.

Paper Structure

This paper contains 40 sections, 10 equations, 17 figures, 6 tables.

Figures (17)

  • Figure 1: Examples of exogenous predictors of different types (trading volume and S&P 500 variables).
  • Figure 2: Target variable: Bitcoin RV (RVBTC) and its logarithm (ln_RVBTC).
  • Figure 3: Example of an original predictor and its corresponding shock indicator.
  • Figure 4: LGBM model.
  • Figure 5: Loss functions for point (left) and quantile (right) forecasting.
  • ...and 12 more figures