Regularized zero-inflated Bernoulli regression model

Mouhamed Ndoye; Aba Diop

Regularized zero-inflated Bernoulli regression model

Mouhamed Ndoye, Aba Diop

TL;DR

This work addresses high-dimensional zero-inflated binary data by proposing regularized zero-inflated Bernoulli regression (ZIBerRM). It develops penalized maximum likelihood estimators under LASSO, Ridge, and Elastic-net penalties, establishing existence, consistency, and asymptotic normality, and uses data-driven tuning via information criteria. Through simulations over scenarios with 25% and 50% zero inflation, Elastic-net consistently achieves the best balance of bias, precision, and coverage compared with LASSO and Ridge, especially at larger sample sizes. A real-data application demonstrates improved variable selection and predictive performance in zero-inflated contexts, underscoring the method's practical relevance for high-dimensional binary outcomes with excess zeros.

Abstract

Logistic regression model is widely used in many studies to investigate the relationship between a binary response variable Y and a set of potential predictors $X_1,\ldots, X_p$ (for example: $Y = 1$ if the outcome occurred and $Y = 0$ otherwise). One problem arising then is that, a proportion of the study subjects cannot experience the outcome of interest. This leads to an excessive presence of zeros in the study sample. This article is interested in estimating parameters of the zero-inflated Bernouilli regression model in a high-dimensional setting, i.e. with a large number of regressors. We use particulary Ridge regression and the Lasso which are typically achieved by constraining the weights of the model. and are useful when the number of predictors is much bigger than the number of observations. We establish the existency, consistency and asymptotic normality of the proposed regularized estimator. Then, we conduct a simulation study to investigate its finite-sample behavior, and application to real data.

Regularized zero-inflated Bernoulli regression model

TL;DR

Abstract

Logistic regression model is widely used in many studies to investigate the relationship between a binary response variable Y and a set of potential predictors

(for example:

if the outcome occurred and

otherwise). One problem arising then is that, a proportion of the study subjects cannot experience the outcome of interest. This leads to an excessive presence of zeros in the study sample. This article is interested in estimating parameters of the zero-inflated Bernouilli regression model in a high-dimensional setting, i.e. with a large number of regressors. We use particulary Ridge regression and the Lasso which are typically achieved by constraining the weights of the model. and are useful when the number of predictors is much bigger than the number of observations. We establish the existency, consistency and asymptotic normality of the proposed regularized estimator. Then, we conduct a simulation study to investigate its finite-sample behavior, and application to real data.

Regularized zero-inflated Bernoulli regression model

TL;DR

Abstract

Regularized zero-inflated Bernoulli regression model

TL;DR

Abstract

Paper Structure

Table of Contents

Key Result

Theorems & Definitions (2)