Table of Contents
Fetching ...

Estimating Zero-inflated Negative Binomial GAMLSS via a Balanced Gradient Boosting Approach with an Application to Antenatal Care Data from Nigeria

Alexandra Daub, Elisabeth Bergherr

TL;DR

To examine the influence of socio-economic factors on the distribution of the number of antenatal care visits in Nigeria, this work generalizes boosting of GAMLSS with shrunk optimal step lengths to base-learners beyond simple linear models and to a more complex response variable distribution.

Abstract

Statistical boosting algorithms are renowned for their intrinsic variable selection and enhanced predictive performance compared to classical statistical methods, making them especially useful for complex models such as generalized additive models for location scale and shape (GAMLSS). Boosting this model class can suffer from imbalanced updates across the distribution parameters as well as long computation times. Shrunk optimal step lengths have been shown to address these issues. To examine the influence of socio-economic factors on the distribution of the number of antenatal care visits in Nigeria, we generalize boosting of GAMLSS with shrunk optimal step lengths to base-learners beyond simple linear models and to a more complex response variable distribution. In an extensive simulation study and in the application we demonstrate that shrunk optimal step lengths yield a more balanced regularization of the overall model and enhance computational efficiency across diverse settings, in particular in the presence of base-learners penalizing the size of the fit.

Estimating Zero-inflated Negative Binomial GAMLSS via a Balanced Gradient Boosting Approach with an Application to Antenatal Care Data from Nigeria

TL;DR

To examine the influence of socio-economic factors on the distribution of the number of antenatal care visits in Nigeria, this work generalizes boosting of GAMLSS with shrunk optimal step lengths to base-learners beyond simple linear models and to a more complex response variable distribution.

Abstract

Statistical boosting algorithms are renowned for their intrinsic variable selection and enhanced predictive performance compared to classical statistical methods, making them especially useful for complex models such as generalized additive models for location scale and shape (GAMLSS). Boosting this model class can suffer from imbalanced updates across the distribution parameters as well as long computation times. Shrunk optimal step lengths have been shown to address these issues. To examine the influence of socio-economic factors on the distribution of the number of antenatal care visits in Nigeria, we generalize boosting of GAMLSS with shrunk optimal step lengths to base-learners beyond simple linear models and to a more complex response variable distribution. In an extensive simulation study and in the application we demonstrate that shrunk optimal step lengths yield a more balanced regularization of the overall model and enhance computational efficiency across diverse settings, in particular in the presence of base-learners penalizing the size of the fit.
Paper Structure (25 sections, 25 equations, 19 figures, 9 tables, 1 algorithm)

This paper contains 25 sections, 25 equations, 19 figures, 9 tables, 1 algorithm.

Figures (19)

  • Figure 1: Coefficient paths for a Gaussian location and scale model in the simulation setting without additional non-linear effect (\ref{['daub:simu_gaussian_setting']}). Dark blue paths represent informative and gray paths uninformative effects. The dashed and dotted vertical lines represent potential stopping iterations
  • Figure 2: Shrunk optimal step lengths for varying levels of penalization of the base-learner representing the categorical effect (columns) in the Gaussian simulation setting (\ref{['daub:simu_gaussian_setting']})
  • Figure 3: Comparison of the penalty parameter and the mean step length in the first 100 iterations of a simulation run for base-learners representing different effects (columns) in the Gaussian simulation setting (\ref{['daub:simu_gaussian_setting']}). For the explicit specification of the base-learners, see Sect. \ref{['daub:section_simulations']}
  • Figure 4: Distribution of the coefficient estimates in the Gaussian simulation setting (\ref{['daub:simu_gaussian_setting']}) with categorical effects. The red horizontal lines represent the true coefficients
  • Figure 5: Partial effects of the informative non-linear effect in the Gaussian simulation setting (\ref{['daub:simu_gaussian_setting']}). The red dashed lines represent the true partial effect
  • ...and 14 more figures