Table of Contents
Fetching ...

Generalized Resubstitution for Regression Error Estimation

Diego Marcondes, Ulisses Braga-Neto

Abstract

We propose generalized resubstitution error estimators for regression, a broad family of estimators, each corresponding to a choice of empirical probability measures and loss function. The usual sum of squares criterion is a special case corresponding to the standard empirical probability measure and the quadratic loss. Other choices of empirical probability measure lead to more general estimators with superior bias and variance properties. We prove that these error estimators are consistent under broad assumptions. In addition, procedures for choosing the empirical measure based on the method of moments and maximum pseudo-likelihood are proposed and investigated. Detailed experimental results using polynomial regression demonstrate empirically the superior finite-sample bias and variance properties of the proposed estimators. The R code for the experiments is provided.

Generalized Resubstitution for Regression Error Estimation

Abstract

We propose generalized resubstitution error estimators for regression, a broad family of estimators, each corresponding to a choice of empirical probability measures and loss function. The usual sum of squares criterion is a special case corresponding to the standard empirical probability measure and the quadratic loss. Other choices of empirical probability measure lead to more general estimators with superior bias and variance properties. We prove that these error estimators are consistent under broad assumptions. In addition, procedures for choosing the empirical measure based on the method of moments and maximum pseudo-likelihood are proposed and investigated. Detailed experimental results using polynomial regression demonstrate empirically the superior finite-sample bias and variance properties of the proposed estimators. The R code for the experiments is provided.

Paper Structure

This paper contains 24 sections, 26 theorems, 136 equations, 8 figures, 1 algorithm.

Key Result

Proposition 3

Fix a loss function $\ell$, a prediction rule $\Psi_{n}$ and a collection $\mathscr{B}(S_{n})$ of probability measures for each sample $S_{n} \in \mathcal{Z}^{n}$. If $\hat{\varepsilon}_{n}^{\mathscr{B}}$ is consistent and finite_moments_Q holds, then

Figures (8)

  • Figure 1: An illustration of Gaussian bolstering for regression. (A) An example in which $\psi_{n}(x) = x^2$ with the points $(0,0.05)$ and $(0,-0.05)$ outlined in red and blue, respectively. The predictor $\psi_{n}$ has the same quadratic loss in both points. The shaded area represents the distance from $\psi_{n}(x)$ to $0.05$ (red) and $-0.05$ (blue). (B) The squared distance from $\psi_{n}(x)$ to $0.05$ (red) and $-0.05$ (blue) in (A) weighted by a Gaussian density with mean zero and standard deviation $0.1$. The contribution of each point to the Gaussian bolstering estimator \ref{['gbr']} is the area under the respective curve, and hence that of the blue point is greater. (C) Points $(x,y)$ generated by $y = \psi^{\star}(x) + \epsilon$, in which $\psi^{\star}$ is the dashed curve in blue and $\epsilon$ is a Gaussian noise. The shaded regions represent $X_{i} \pm 3\sigma$ in which $\sigma$ is the standard deviation of the Gaussian bolstering distribution. The polynomial $\psi_{n}$ (black) interpolates the data, so its resubstitution error is zero. Nevertheless, the Gaussian bolstering error estimator is not zero and equals essentially the mean distance from $\psi_{n}(x)$ to $Y_{i}$ for $x$ in the shaded neighborhood of $X_{i}$ when $x$ is distributed as a Gaussian distribution with mean $X_{1}$ and standard deviation $\sigma$.
  • Figure 2: An illustration of posterior-probability resubstitution for Bayesian regression. (A) An example in which $\psi_{n}(x) = x^2$ with the points $(0,0.05)$ and $(0.12,-0.0356)$ outlined in red and blue, respectively. The predictor $\psi_{n}$ has the same quadratic loss in both of these points. (B) The posterior probability of $Y$ given $X_{i} = 0$ (red) and $X_{i} = 0.12$ (blue). The vertical dashed lines represent the respective value of $\psi_{n}(X_{i})$. The contribution of each point to $\varepsilon_{n}^{ppr}$ is the expected value of $(\psi_{n}(X_{i}) - Y)^2$ under the respective posterior probability, that is, the expected squared distance from $Y$ to the respective dashed line. The contribution of the blue point is greater than that of the red one.
  • Figure 3: An illustration of the $XY$-Gaussian bolstering error estimator with the same data of Figure \ref{['fig_ex_poly_gb']}. The contribution of each data point $(X_{i},Y_{i})$ to the bolstered error estimator is the expected squared distance of $Y$ to $\psi_n(X)$ for $(X,Y)$ distributed as a Gaussian distribution with mean $(X_{i},Y_{i})$ and a covariance matrix $\Sigma_{i}$. The ellipses in (A) and (B) are level curves of the respective Gaussian distribution and illustrate the form of $\Sigma_{i}$.
  • Figure 4: Illustration of the estimation of $\sigma_{S_{n}}$ by the method of moments. (A) A sample of $X$ in $d = 2$ dimensions with $\bar{\delta}_{S_n} = 0.24$. (B) A grid search to estimate $\sigma_{S_{n}}$ by solving equation \ref{['mm_equation']} considering $\Sigma = I_{d}$. The expectation $\mathbb{E}[\delta(\hat{X})]$ is computed via Monte Carlo integration for $\sigma_{S_{n}}$ in a grid and the estimate $\hat{\sigma}_{S_{n}}$ is that with $\mathbb{E}[\delta(\hat{X})]$ closer to $\bar{\delta}_{S_n}$ that was $\hat{\sigma}_{S_{n}} = 0.31$.
  • Figure 5: Example of method of moments and maximum pseudo-likelihood estimators for the kernel. The plots present the level curves of the Gaussian distributions with kernel estimated by the method of moments, exact (dashed) and approximated (dotted), and maximum pseudo-likelihood estimation (solid) in (A) Gaussian bolstering and (B)$XY$-Gaussian bolstering. All the level curves are such that the probability inside the sphere/ellipse equals $0.05$.
  • ...and 3 more figures

Theorems & Definitions (40)

  • Definition 1
  • Definition 2
  • Proposition 3
  • Theorem 4
  • Proposition 5
  • Corollary 6
  • Corollary 7
  • Proposition 8
  • Proposition 9
  • Corollary 10
  • ...and 30 more