Least trimmed squares regression with missing values and cellwise outliers

Jakob Raymaekers; Peter J. Rousseeuw

Least trimmed squares regression with missing values and cellwise outliers

Jakob Raymaekers, Peter J. Rousseeuw

Abstract

Regression is the workhorse of statistics, and is often faced with real data that contain outliers. When these are casewise outliers, that is, cases that are entirely wrong or belong to a different population, the issue can be remedied by existing casewise robust regression methods. It is another matter when cellwise outliers occur, that is, suspicious individual entries in the data matrix containing the regressors and the response. We propose a new regression method that is robust to both casewise and cellwise outliers, and handles missing values as well. Its construction allows for skewed distributions. We show that it obeys the first breakdown result for cellwise robust regression. It is also the first such method that is geared to making robust out-of-sample predictions. Its performance is studied by simulation, and it is illustrated on a substantial real dataset.

Least trimmed squares regression with missing values and cellwise outliers

Abstract

Paper Structure (25 sections, 22 equations, 27 figures, 4 tables)

This paper contains 25 sections, 22 equations, 27 figures, 4 tables.

Introduction
Preliminaries
Data symmetrization
The cellwise robust MCD method
Least trimmed squares regression
Methodology
Estimating the regression coefficients
Computing out-of-sample predictions
Breakdown
More about the algorithm
Optimizing the regression objective function
Faster symmetrization
Simulation study
Accuracy of coefficients and predictions
Simulation of symmetrization
...and 10 more sections

Figures (27)

Figure 1: A toy example to illustrate the basic idea of the method.
Figure 2: Top: average MD (on log scale) of the estimated coefficients for $n = 400$, $d = 20$, $\varepsilon = 20\%$ of cellwise outliers, and $\boldsymbol{\Sigma} = \boldsymbol{\Sigma}_{\hbox{\scriptsize ALYZ}}$ (left) or $\boldsymbol{\Sigma} = \boldsymbol{\Sigma}_{\hbox{\scriptsize A09}}$ (right), for normal predictors. Bottom: corresponding MSE, also on log scale.
Figure 3: Like Figure \ref{['fig:MD_MSE_normal']}, but for exponential predictors.
Figure 4: Like Figure \ref{['fig:MD_MSE_normal']}, but for lognormal predictors.
Figure 5: Top row: average MD (on log scale) of the estimated coefficients for different symmetrization strategies and normal predictors. The data has dimension $d = 20$, $\varepsilon = 20\%$ of cellwise outliers, and $\boldsymbol{\Sigma} = \boldsymbol{\Sigma}_{\hbox{\scriptsize ALYZ}}$ (left) or $\boldsymbol{\Sigma} = \boldsymbol{\Sigma}_{\hbox{\scriptsize A09}}$ (right). Middle row: same for exponential predictors. Bottom row: same for lognormal predictors.
...and 22 more figures

Least trimmed squares regression with missing values and cellwise outliers

Abstract

Least trimmed squares regression with missing values and cellwise outliers

Authors

Abstract

Table of Contents

Figures (27)