Table of Contents
Fetching ...

Robust Regression with Students T: The Role of Degrees of Freedom

Amanda Ng, Shangkai Zhu, Archer Gong Zhang, Nancy Reid

Abstract

Linear regression estimators are known to be sensitive to outliers, and one alternative to obtain a robust and efficient estimator of the regression parameter is to model the error with Student's $t$ distribution. In this article, we compare estimators of the degrees of freedom parameter in the $t$ distribution using frequentist and Bayesian methods, and then study properties of the corresponding estimated regression coefficient. We also include the comparison with some recommended approaches in the literature, including fixing the degrees of freedom and robust regression using the Huber loss. Our extensive simulations on both synthetic and real data demonstrate that estimating the degrees of freedom via the adjusted profile log-likelihood approach yields regression coefficient estimators with high accuracy, performing comparably to the maximum likelihood estimator where the degrees of freedom are fixed at their true values. These findings provide a detailed synthesis of $t$-based robust regression and underscore a key insight: the proper calibration of the degree of freedom is as crucial as the choice of the robust distribution itself for achieving optimal performance.

Robust Regression with Students T: The Role of Degrees of Freedom

Abstract

Linear regression estimators are known to be sensitive to outliers, and one alternative to obtain a robust and efficient estimator of the regression parameter is to model the error with Student's distribution. In this article, we compare estimators of the degrees of freedom parameter in the distribution using frequentist and Bayesian methods, and then study properties of the corresponding estimated regression coefficient. We also include the comparison with some recommended approaches in the literature, including fixing the degrees of freedom and robust regression using the Huber loss. Our extensive simulations on both synthetic and real data demonstrate that estimating the degrees of freedom via the adjusted profile log-likelihood approach yields regression coefficient estimators with high accuracy, performing comparably to the maximum likelihood estimator where the degrees of freedom are fixed at their true values. These findings provide a detailed synthesis of -based robust regression and underscore a key insight: the proper calibration of the degree of freedom is as crucial as the choice of the robust distribution itself for achieving optimal performance.
Paper Structure (29 sections, 3 theorems, 37 equations, 15 figures, 9 tables)

This paper contains 29 sections, 3 theorems, 37 equations, 15 figures, 9 tables.

Key Result

Theorem 1

For the profile log-likelihood estimation of $\hat{\nu}$, both the variance and the bias has order $\mathcal{O}(n^{-1})$.

Figures (15)

  • Figure 1: Student's $t$distribution has heavier tails than the normal distribution, with the difference decreasing as $\nu$ increases. Asymptotically, as $\nu\rightarrow\infty$ the $t$-distribution converges to the normal distribution.
  • Figure 2: Loss function of OLS vs Huber vs t distribution
  • Figure 3: Overall RMSE of $\hat{\beta}$ using stackloss data (data generation method 1)
  • Figure 4: Overall RMSE of $\hat{\beta}$ in simulated $t$-error data (data generation method 2)
  • Figure 5: Overall RMSE of $\hat{\beta}$ using simulated normal-error data (data generation method 3)
  • ...and 10 more figures

Theorems & Definitions (6)

  • Theorem 1
  • proof
  • Corollary 1.1
  • proof
  • Theorem 2
  • proof