Table of Contents
Fetching ...

Robust variable selection for partially linear additive models

Graciela Boente, Alejandra Martínez

TL;DR

To simultaneously achieve model selection for the parametric component of the model and resistance to outliers, the proposed procedure combines preliminary robust estimators of the additive component, robust linear $MM-$regression estimators with a penalty such as SCAD on the coefficients in the parametric part.

Abstract

Among semiparametric regression models, partially linear additive models provide a useful tool to include additive nonparametric components as well as a parametric component, when explaining the relationship between the response and a set of explanatory variables. This paper concerns such models under sparsity assumptions for the covariates included in the linear component. Sparse covariates are frequent in regression problems where the task of variable selection is usually of interest. As in other settings, outliers either in the residuals or in the covariates involved in the linear component have a harmful effect. To simultaneously achieve model selection for the parametric component of the model and resistance to outliers, we combine preliminary robust estimators of the additive component, robust linear $MM-$regression estimators with a penalty such as SCAD on the coefficients in the parametric part. Under mild assumptions, consistency results and rates of convergence for the proposed estimators are derived. A Monte Carlo study is carried out to compare, under different models and contamination schemes, the performance of the robust proposal with its classical counterpart. The obtained results show the advantage of using the robust approach. Through the analysis of a real data set, we also illustrate the benefits of the proposed procedure.

Robust variable selection for partially linear additive models

TL;DR

To simultaneously achieve model selection for the parametric component of the model and resistance to outliers, the proposed procedure combines preliminary robust estimators of the additive component, robust linear regression estimators with a penalty such as SCAD on the coefficients in the parametric part.

Abstract

Among semiparametric regression models, partially linear additive models provide a useful tool to include additive nonparametric components as well as a parametric component, when explaining the relationship between the response and a set of explanatory variables. This paper concerns such models under sparsity assumptions for the covariates included in the linear component. Sparse covariates are frequent in regression problems where the task of variable selection is usually of interest. As in other settings, outliers either in the residuals or in the covariates involved in the linear component have a harmful effect. To simultaneously achieve model selection for the parametric component of the model and resistance to outliers, we combine preliminary robust estimators of the additive component, robust linear regression estimators with a penalty such as SCAD on the coefficients in the parametric part. Under mild assumptions, consistency results and rates of convergence for the proposed estimators are derived. A Monte Carlo study is carried out to compare, under different models and contamination schemes, the performance of the robust proposal with its classical counterpart. The obtained results show the advantage of using the robust approach. Through the analysis of a real data set, we also illustrate the benefits of the proposed procedure.
Paper Structure (12 sections, 5 theorems, 109 equations, 7 figures, 5 tables, 2 algorithms)

This paper contains 12 sections, 5 theorems, 109 equations, 7 figures, 5 tables, 2 algorithms.

Key Result

Theorem 3.1

Let $(Y_i,\mathbf Z_i^{\hbox{\footnotesize \sc t}},\mathbf X_i^{\hbox{\footnotesize \sc t}})^{\hbox{\footnotesize \sc t}}$ be i.i.d. observations satisfying eq:plam with the errors $\varepsilon_i$ independent from the vector of covariates $(\mathbf Z_i^{\hbox{\footnotesize \sc t}},\mathbf X_i^{\hbox

Figures (7)

  • Figure 1: Panels (a) and (b) display the plot of the CN$_0$ and CF values, across all the contamination settings, respectively. The blue bars correspond to the results obtained with the robust proposal and the red ones to those of the $\textsc{ls}-$estimator.
  • Figure 2: Plot of the median over replications of the GMSE for the penalized and oracle estimators across all the contamination cases. The red triangles and blue circles correspond to the penalized $\textsc{ls}-$estimator and the robust counterpart, respectively, while the pink and light blue squares identify the medians of the the oracle least-square and oracle robust estimator, respectively.
  • Figure 3: Adjusted boxplots of the GMSE and OGMSE for the penalized and oracle estimators across all the contamination cases.
  • Figure 4: Pie charts with the proportion of times each value in the grid was selected for contaminations $C_0$ and $C_4$. The gray, purple, blue and pink areas correspond to the values $0$, $0.2$, $0.4$ and $0.6$, respectively.
  • Figure 5: Adjusted boxplots for the mape measures obtained for the estimators without penalization (on the left) and for the penalized (on the right) estimators.
  • ...and 2 more figures

Theorems & Definitions (14)

  • Remark 3.1: Comments on assumptions
  • Theorem 3.1
  • Remark 3.2
  • Theorem 3.2
  • Remark 3.3
  • Theorem 3.3
  • Remark 3.4
  • proof : Proof of \ref{['teo:1']}
  • Lemma A.1
  • proof
  • ...and 4 more