Table of Contents
Fetching ...

Estimating the Partially Linear Zero-Inflated Poisson Regression Model: a Robust Approach Using a EM-like Algorithm

María José Llop, Andrea Bergesio, Anne-Françoise Yao

TL;DR

This article presents the first robust estimation method specifically developed for the PLZIP model, using an Expectation-Maximization-like algorithm to take advantage of the mixture nature of the model and to address extreme observations in both the response and the covariates.

Abstract

Count data with an excessive number of zeros frequently arise in fields such as economics, medicine, and public health. Traditional count models often fail to adequately handle such data, especially when the relationship between the response and some predictors is nonlinear. To overcome these limitations, the partially linear zero-inflated Poisson (PLZIP) model has been proposed as a flexible alternative. However, all existing estimation approaches for this model are based on likelihood, which is known to be highly sensitive to outliers and slight deviations from the model assumptions. This article presents the first robust estimation method specifically developed for the PLZIP model. An Expectation-Maximization-like algorithm is used to take advantage of the mixture nature of the model and to address extreme observations in both the response and the covariates. Results of the algorithm convergence and the consistency of the estimators are proved. A simulation study under various contamination schemes showed the robustness and efficiency of the proposed estimators in finite samples, compared to classical estimators. Finally, the application of the methodology is illustrated through an example using real data.

Estimating the Partially Linear Zero-Inflated Poisson Regression Model: a Robust Approach Using a EM-like Algorithm

TL;DR

This article presents the first robust estimation method specifically developed for the PLZIP model, using an Expectation-Maximization-like algorithm to take advantage of the mixture nature of the model and to address extreme observations in both the response and the covariates.

Abstract

Count data with an excessive number of zeros frequently arise in fields such as economics, medicine, and public health. Traditional count models often fail to adequately handle such data, especially when the relationship between the response and some predictors is nonlinear. To overcome these limitations, the partially linear zero-inflated Poisson (PLZIP) model has been proposed as a flexible alternative. However, all existing estimation approaches for this model are based on likelihood, which is known to be highly sensitive to outliers and slight deviations from the model assumptions. This article presents the first robust estimation method specifically developed for the PLZIP model. An Expectation-Maximization-like algorithm is used to take advantage of the mixture nature of the model and to address extreme observations in both the response and the covariates. Results of the algorithm convergence and the consistency of the estimators are proved. A simulation study under various contamination schemes showed the robustness and efficiency of the proposed estimators in finite samples, compared to classical estimators. Finally, the application of the methodology is illustrated through an example using real data.
Paper Structure (12 sections, 5 theorems, 51 equations, 7 figures, 1 table)

This paper contains 12 sections, 5 theorems, 51 equations, 7 figures, 1 table.

Key Result

Theorem 3.1

If there exists a point ${\boldsymbol \theta}^*$ such that the ES algorithm converges to, this is $\lim_{r \rightarrow \infty} \widehat{{\boldsymbol \theta}}_n^{(r)} = {\boldsymbol \theta}^*$, and $\rho$ is such that its derivative $\Psi$ is continuous and verifies $E_{{\boldsymbol \theta}}(\Psi(y,

Figures (7)

  • Figure 1: $L_2$-norm distance from ${\boldsymbol \beta}_0$ to the estimators. The panels a), b), d) and e) represent each contamination scheme. c) and f) are zoomed views of the panels b) and e), respectively.
  • Figure 2: $L_2$-norm distance from ${\boldsymbol \gamma}_0$ to each estimator. The panels a), b), c) and d) represent each contamination scheme.
  • Figure 3: RMSE of each estimator of the nonparametric component $m_0$. The panels a), b), d) and e) represent each contamination scheme. c) and f) are zoomed views of the panels b) and e), respectively.
  • Figure 4: True function $m_0$ and its estimates obtained from a specific simulated dataset under each contamination scheme.
  • Figure 5: Barplot of the number of days of missed primary activities due to illness in the past 4 weeks self-reported by the respondent.
  • ...and 2 more figures

Theorems & Definitions (11)

  • Definition 2.1
  • Remark 2.1
  • Remark 3.1
  • Theorem 3.1
  • Remark 3.2
  • Lemma 4.1
  • Theorem 4.1
  • Remark 4.1
  • Corollary 4.1
  • Theorem 4.2
  • ...and 1 more