Table of Contents
Fetching ...

Reconciliating Bayesian and frequentist approaches to robustness against outliers

Philippe Gagnon, Alain Desgagné

Abstract

Heavy-tailed models are used as a way to gain robustness against outliers in Bayesian analyses. In frequentist analyses, M-estimators are often employed. In this paper, the two approaches are tentatively reconciled by considering M-estimators as maximum likelihood estimators of heavy-tailed models. From this perspective, it is realized that a fundamental difference exists as frequentists, contrarily to Bayesians, do not require these heavy-tailed models to be proper. For instance, a popular robust estimator in linear regression, Tukey's biweight M-estimator, does not correspond to a proper heavy-tailed model. Thus, a Bayesian practitioner does not have access to the same range of tools as a frequentist practitioner. It is shown through two real-data linear regression analyses that the former may in consequence obtain significantly different estimation results than the latter, where the difference is due to a more pronounced influence by the outliers in the former case. It is highlighted that a way to give these practitioners access to the same range of tools is for the Bayesian to adopt the generalized Bayesian framework of Bissiri et al. (2016) which allows the use of improper models (Jewson and Rossell, 2022), in combination with proper prior distributions yielding proper generalized posterior distributions. A complete reconciliation of the Bayesian and frequentist approaches to robustness is then achieved. An extensive theoretical study of the generalized Bayesian counterpart of Tukey's biweight M-estimator is provided, which includes a robustness characterization result and a Bernstein--von Mises result, the latter allowing to calibrate the generalized posterior distribution for meaningful uncertainty quantification. After adopting the generalized Bayesian framework, the Bayesian practitioner obtains similar results as the frequentist practitioner in the aforementioned examples.

Reconciliating Bayesian and frequentist approaches to robustness against outliers

Abstract

Heavy-tailed models are used as a way to gain robustness against outliers in Bayesian analyses. In frequentist analyses, M-estimators are often employed. In this paper, the two approaches are tentatively reconciled by considering M-estimators as maximum likelihood estimators of heavy-tailed models. From this perspective, it is realized that a fundamental difference exists as frequentists, contrarily to Bayesians, do not require these heavy-tailed models to be proper. For instance, a popular robust estimator in linear regression, Tukey's biweight M-estimator, does not correspond to a proper heavy-tailed model. Thus, a Bayesian practitioner does not have access to the same range of tools as a frequentist practitioner. It is shown through two real-data linear regression analyses that the former may in consequence obtain significantly different estimation results than the latter, where the difference is due to a more pronounced influence by the outliers in the former case. It is highlighted that a way to give these practitioners access to the same range of tools is for the Bayesian to adopt the generalized Bayesian framework of Bissiri et al. (2016) which allows the use of improper models (Jewson and Rossell, 2022), in combination with proper prior distributions yielding proper generalized posterior distributions. A complete reconciliation of the Bayesian and frequentist approaches to robustness is then achieved. An extensive theoretical study of the generalized Bayesian counterpart of Tukey's biweight M-estimator is provided, which includes a robustness characterization result and a Bernstein--von Mises result, the latter allowing to calibrate the generalized posterior distribution for meaningful uncertainty quantification. After adopting the generalized Bayesian framework, the Bayesian practitioner obtains similar results as the frequentist practitioner in the aforementioned examples.
Paper Structure (23 sections, 7 theorems, 105 equations, 11 figures, 1 table)

This paper contains 23 sections, 7 theorems, 105 equations, 11 figures, 1 table.

Key Result

Proposition 1

Suppose that the prior distribution $\pi$ is proper and that $\hat{\sigma}_{\text{TM}} \in (0, \infty)$ for the data set $\{\mathbf{x}_i, y_i\}_{i=1}^n$ at hand. Therefore, the generalized posterior distribution defined through eq:postTukey is proper. Additionally, the moments of order $\kappa \in \

Figures (11)

  • Figure 1: (a) Estimation of a simple linear regression based on the shock data set using OLS and Tukey's biweight M-estimator, as well as the maximum a posteriori estimate of a Bayesian LPTN model. (b) Weight assigned to each data point in Tukey's biweight M-estimation and Bayesian LPTN model estimation.
  • Figure 2: Results of a Monte Carlo study with $p = 2$, $\boldsymbol\beta_0 = (1, 1)^T$, $\sigma_0 = 1$, and $\mu_{\mathbf{X}}$ and $f_0$ corresponding to the standard normal. For each value of $n$, 1,000 data sets are simulated, and for each data set, estimates are computed under Tukey's biweight improper model and the normal model. In (a), the MAP estimate of $\beta_2$ is shown. In (b), the HPD CI length for $\beta_2$ is shown. In (c), we present the coverage of the true coefficient value $\beta_{0, 2} = 1$ by the HPD CI.
  • Figure 3: Standardized residuals against fitted values computed from (a) Tukey's biweight estimates, (b) Bayesian LPTN estimates, and (c) OLS estimates.
  • Figure 4: $\varrho$ as a function of $\varepsilon$ when $\varrho$ is the quadratic function and the $\varrho$ function associated to the Huber M-estimator with $k = 1.345$.
  • Figure 5: $\varrho$ as a function of $\varepsilon$ when $\varrho$ is the function associated to Tukey's biweight M-estimator with $k = 4.685$.
  • ...and 6 more figures

Theorems & Definitions (15)

  • Remark 1
  • Proposition 1
  • Theorem 1
  • Lemma 1
  • Theorem 2
  • proof : Proof of \ref{['prop:proper']}
  • proof : Proof of \ref{['Thm:robustness']}
  • proof : Proof of \ref{['lemma:large-sample']}
  • proof : Proof of \ref{['Thm:large-sample']}
  • Proposition 2
  • ...and 5 more