Boosted generalized normal distributions: Integrating machine learning with operations knowledge

Ragip Gurlek; Francis de Vericourt; Donald K. K. Lee

Boosted generalized normal distributions: Integrating machine learning with operations knowledge

Ragip Gurlek, Francis de Vericourt, Donald K. K. Lee

TL;DR

This work tackles the gap between point predictions and distributional forecasts in operations by introducing the Boosted Generalized Normal Distribution ($b$GND), which models $Y|X$ with covariate-dependent location $\mu(x)$ and scale $b(x)$ learned via gradient boosting. By fixing the shape parameter $\gamma$ and decoupling the estimation of $\mu(x)$ and $b(x)$, the authors establish statistical consistency and provide a robust estimation algorithm based on sample-splitting. Empirically, $b$GND improves distributional forecasts in a large emergency department dataset, achieving CRPS gains of about $6.1\%$ for wait times and $8.8\%$ for service times relative to a distribution-agnostic benchmark, with downstream improvements in patient satisfaction and mortality reductions. The paper demonstrates the practical value of integrating operations knowledge with ML for distributional forecasting and highlights the approach’s potential applicability across healthcare operations and other domains requiring reliable distributional predictions.

Abstract

Applications of machine learning (ML) techniques to operational settings often face two challenges: i) ML methods mostly provide point predictions whereas many operational problems require distributional information; and ii) They typically do not incorporate the extensive body of knowledge in the operations literature, particularly the theoretical and empirical findings that characterize specific distributions. We introduce a novel and rigorous methodology, the Boosted Generalized Normal Distribution ($b$GND), to address these challenges. The Generalized Normal Distribution (GND) encompasses a wide range of parametric distributions commonly encountered in operations, and $b$GND leverages gradient boosting with tree learners to flexibly estimate the parameters of the GND as functions of covariates. We establish $b$GND's statistical consistency, thereby extending this key property to special cases studied in the ML literature that lacked such guarantees. Using data from a large academic emergency department in the United States, we show that the distributional forecasting of patient wait and service times can be meaningfully improved by leveraging findings from the healthcare operations literature. Specifically, $b$GND performs 6% and 9% better than the distribution-agnostic ML benchmark used to forecast wait and service times respectively. Further analysis suggests that these improvements translate into a 9% increase in patient satisfaction and a 4% reduction in mortality for myocardial infarction patients. Our work underscores the importance of integrating ML with operations knowledge to enhance distributional forecasts.

Boosted generalized normal distributions: Integrating machine learning with operations knowledge

TL;DR

This work tackles the gap between point predictions and distributional forecasts in operations by introducing the Boosted Generalized Normal Distribution (

GND), which models

with covariate-dependent location

and scale

learned via gradient boosting. By fixing the shape parameter

and decoupling the estimation of

and

, the authors establish statistical consistency and provide a robust estimation algorithm based on sample-splitting. Empirically,

GND improves distributional forecasts in a large emergency department dataset, achieving CRPS gains of about

for wait times and

for service times relative to a distribution-agnostic benchmark, with downstream improvements in patient satisfaction and mortality reductions. The paper demonstrates the practical value of integrating operations knowledge with ML for distributional forecasting and highlights the approach’s potential applicability across healthcare operations and other domains requiring reliable distributional predictions.

Abstract

GND), to address these challenges. The Generalized Normal Distribution (GND) encompasses a wide range of parametric distributions commonly encountered in operations, and

GND leverages gradient boosting with tree learners to flexibly estimate the parameters of the GND as functions of covariates. We establish

GND's statistical consistency, thereby extending this key property to special cases studied in the ML literature that lacked such guarantees. Using data from a large academic emergency department in the United States, we show that the distributional forecasting of patient wait and service times can be meaningfully improved by leveraging findings from the healthcare operations literature. Specifically,

GND performs 6% and 9% better than the distribution-agnostic ML benchmark used to forecast wait and service times respectively. Further analysis suggests that these improvements translate into a 9% increase in patient satisfaction and a 4% reduction in mortality for myocardial infarction patients. Our work underscores the importance of integrating ML with operations knowledge to enhance distributional forecasts.

Paper Structure (17 sections, 9 theorems, 58 equations, 3 figures, 5 tables, 2 algorithms)

This paper contains 17 sections, 9 theorems, 58 equations, 3 figures, 5 tables, 2 algorithms.

Introduction
Boosted GND
The Value of Operations-Informed Parametric Models
Estimation Algorithm for bGND
Statistical Consistency of bGND
Consistency of $(\hat{\mu},\hat{b})$
Supporting results
Concentration inequalities.
Minimization of empirical risk.
Forecasting ED Wait and Service Times with bGND
Discussion
Boosted estimation of log-scale parameter $\beta(x)$
Example of non-convexity of the expected negative log-likelihood surface
Descriptive Statistics
Experiments
...and 2 more sections

Key Result

Proposition 1

Under Assumptions asm:x_bound-asm:true_parm,

Figures (3)

Figure 1: Patient wait time density conditional on time of arrival
Figure EC.1: Summary of the numeric variables
Figure EC.2: Log-normal density fits (red dashed lines) overlaid on the empirical histograms of service times conditional on time of arrival.

Theorems & Definitions (12)

Remark 1
Remark 2
Remark 3
Proposition 1
Lemma 1
Proposition 2
Lemma 2
Lemma 3
Lemma 4
Lemma 5
...and 2 more

Boosted generalized normal distributions: Integrating machine learning with operations knowledge

TL;DR

Abstract

Boosted generalized normal distributions: Integrating machine learning with operations knowledge

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (3)

Theorems & Definitions (12)