Robust variable selection for partially linear additive models

Graciela Boente; Alejandra Martínez

Robust variable selection for partially linear additive models

Graciela Boente, Alejandra Martínez

TL;DR

To simultaneously achieve model selection for the parametric component of the model and resistance to outliers, the proposed procedure combines preliminary robust estimators of the additive component, robust linear $MM-$regression estimators with a penalty such as SCAD on the coefficients in the parametric part.

Abstract

Among semiparametric regression models, partially linear additive models provide a useful tool to include additive nonparametric components as well as a parametric component, when explaining the relationship between the response and a set of explanatory variables. This paper concerns such models under sparsity assumptions for the covariates included in the linear component. Sparse covariates are frequent in regression problems where the task of variable selection is usually of interest. As in other settings, outliers either in the residuals or in the covariates involved in the linear component have a harmful effect. To simultaneously achieve model selection for the parametric component of the model and resistance to outliers, we combine preliminary robust estimators of the additive component, robust linear $MM-$regression estimators with a penalty such as SCAD on the coefficients in the parametric part. Under mild assumptions, consistency results and rates of convergence for the proposed estimators are derived. A Monte Carlo study is carried out to compare, under different models and contamination schemes, the performance of the robust proposal with its classical counterpart. The obtained results show the advantage of using the robust approach. Through the analysis of a real data set, we also illustrate the benefits of the proposed procedure.

Robust variable selection for partially linear additive models

TL;DR

regression estimators with a penalty such as SCAD on the coefficients in the parametric part.

Abstract

regression estimators with a penalty such as SCAD on the coefficients in the parametric part. Under mild assumptions, consistency results and rates of convergence for the proposed estimators are derived. A Monte Carlo study is carried out to compare, under different models and contamination schemes, the performance of the robust proposal with its classical counterpart. The obtained results show the advantage of using the robust approach. Through the analysis of a real data set, we also illustrate the benefits of the proposed procedure.

Paper Structure (12 sections, 5 theorems, 109 equations, 7 figures, 5 tables, 2 algorithms)

This paper contains 12 sections, 5 theorems, 109 equations, 7 figures, 5 tables, 2 algorithms.

Introduction
The robust penalized estimators
Selection of the penalty parameter
Algorithm
Preliminary estimates of $\mu$ and $\eta_j$
Asymptotic results
Consistency results
Variable selection property
Monte Carlo Study
Real data example
Concluding remarks
Appendix: Proofs

Key Result

Theorem 3.1

Let $(Y_i,\mathbf Z_i^{\hbox{\footnotesize \sc t}},\mathbf X_i^{\hbox{\footnotesize \sc t}})^{\hbox{\footnotesize \sc t}}$ be i.i.d. observations satisfying eq:plam with the errors $\varepsilon_i$ independent from the vector of covariates $(\mathbf Z_i^{\hbox{\footnotesize \sc t}},\mathbf X_i^{\hbox

Figures (7)

Figure 1: Panels (a) and (b) display the plot of the CN$_0$ and CF values, across all the contamination settings, respectively. The blue bars correspond to the results obtained with the robust proposal and the red ones to those of the $\textsc{ls}-$estimator.
Figure 2: Plot of the median over replications of the GMSE for the penalized and oracle estimators across all the contamination cases. The red triangles and blue circles correspond to the penalized $\textsc{ls}-$estimator and the robust counterpart, respectively, while the pink and light blue squares identify the medians of the the oracle least-square and oracle robust estimator, respectively.
Figure 3: Adjusted boxplots of the GMSE and OGMSE for the penalized and oracle estimators across all the contamination cases.
Figure 4: Pie charts with the proportion of times each value in the grid was selected for contaminations $C_0$ and $C_4$. The gray, purple, blue and pink areas correspond to the values $0$, $0.2$, $0.4$ and $0.6$, respectively.
Figure 5: Adjusted boxplots for the mape measures obtained for the estimators without penalization (on the left) and for the penalized (on the right) estimators.
...and 2 more figures

Theorems & Definitions (14)

Remark 3.1: Comments on assumptions
Theorem 3.1
Remark 3.2
Theorem 3.2
Remark 3.3
Theorem 3.3
Remark 3.4
proof : Proof of \ref{['teo:1']}
Lemma A.1
proof
...and 4 more

Robust variable selection for partially linear additive models

TL;DR

Abstract

Robust variable selection for partially linear additive models

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (7)

Theorems & Definitions (14)