Table of Contents
Fetching ...

Outrigger local polynomial regression

Elliot H. Young, Rajen D. Shah, Richard J. Samworth

Abstract

Standard local polynomial estimators of a nonparametric regression function employ a weighted least squares loss function that is tailored to the setting of homoscedastic Gaussian errors. We introduce the outrigger local polynomial estimator, which is designed to achieve distributional adaptivity across different conditional error distributions. It modifies a standard local polynomial estimator by employing an estimate of the conditional score function of the errors and an 'outrigger' that draws on the data in a broader local window to stabilise the influence of the conditional score estimate. Subject to smoothness and moment conditions, and only requiring consistency of the conditional score estimate, we first establish that even under the least favourable settings for the outrigger estimator, the asymptotic ratio of the worst-case local risks of the two estimators is at most $1$, with equality if and only if the conditional error distribution is Gaussian. Moreover, we prove that the outrigger estimator is minimax optimal over Hölder classes up to a multiplicative factor $A_{β,d}$, depending only on the smoothness $β\in (0,\infty)$ of the regression function and the dimension~$d$ of the covariates. When $β\in (0,1]$, we find that $A_{β,d} \leq 1.69$, with $\lim_{β\searrow 0} A_{β,d} = 1$. A further attraction of our proposal is that we do not require structural assumptions such as independence of errors and covariates, or symmetry of the conditional error distribution. Numerical results on simulated and real data validate our theoretical findings; our methodology is implemented in R and available at https://github.com/elliot-young/outrigger.

Outrigger local polynomial regression

Abstract

Standard local polynomial estimators of a nonparametric regression function employ a weighted least squares loss function that is tailored to the setting of homoscedastic Gaussian errors. We introduce the outrigger local polynomial estimator, which is designed to achieve distributional adaptivity across different conditional error distributions. It modifies a standard local polynomial estimator by employing an estimate of the conditional score function of the errors and an 'outrigger' that draws on the data in a broader local window to stabilise the influence of the conditional score estimate. Subject to smoothness and moment conditions, and only requiring consistency of the conditional score estimate, we first establish that even under the least favourable settings for the outrigger estimator, the asymptotic ratio of the worst-case local risks of the two estimators is at most , with equality if and only if the conditional error distribution is Gaussian. Moreover, we prove that the outrigger estimator is minimax optimal over Hölder classes up to a multiplicative factor , depending only on the smoothness of the regression function and the dimension~ of the covariates. When , we find that , with . A further attraction of our proposal is that we do not require structural assumptions such as independence of errors and covariates, or symmetry of the conditional error distribution. Numerical results on simulated and real data validate our theoretical findings; our methodology is implemented in R and available at https://github.com/elliot-young/outrigger.
Paper Structure (26 sections, 34 theorems, 335 equations, 8 figures, 1 table, 1 algorithm)

This paper contains 26 sections, 34 theorems, 335 equations, 8 figures, 1 table, 1 algorithm.

Key Result

Theorem 1

Suppose that $\mathcal{P}$ satisfies Assumption ass:DGM and that $\hat{f}^{\mathrm{Outrig}}$ in Algorithm alg:outrigger satisfies Assumption ass:bandwidth-kernels-and-co. Then for each $x_0\in\mathcal{X}$, where and where the deterministic quantity $B(f,x_0,K,h)$ is defined in Appendix appsec:proof-decomp and satisfies $B(f,x_0,K,h) = O_{\mathcal{P},\mathcal{X},\mathcal{H}}(1)$.

Figures (8)

  • Figure 1: Kernel density estimates of $\hat{f}(0) - f(0)$ for the simulation example of Section \ref{['sec:numerics-nonindep-errors']}\ref{['item:exp-t3']} for different estimators $\hat{f}$, based on 1000 repetitions with sample size $n=10^4$. A standard local constant estimator (black) does not adapt to the unknown (non-Gaussian) error distribution, so its variance is larger than that of the oracle local likelihood estimator \ref{['eq:rho-est-oracle']} (dashed yellow) that exploits knowledge of the conditional score function $\rho$. The estimator \ref{['eq:errorO']} based on a naive distributional plug-in estimator $\hat{\rho}$ of $\rho$ (green) reduces variance compared with the standard local polynomial estimator, at the expense of a significant additional bias. On the other hand, our outrigger estimator (orange) enjoys a very similar reduction in variance to the distributional plug-in estimator, and a similar bias to those of the oracle and standard local polynomial estimators. The mean squared error of each estimator is given in the adjoining table.
  • Figure 2: Illustration of a local constant estimator ($p=0$) in the single covariate ($d=1$) case at $x_0=0.35$ with bandwidth $h=0.05$ and outrigger parameter $\lambda=5$. The solid black curve is the true regression function. The green line shows the outrigger local constant fit at $x_0$. The bottom diagram shows the orthogonal combination of 'inner and outer region' kernels about $x_0$. In our schematic, $K(\nu)=\tfrac{3}{4}\max(1-\nu^2,0)$ is the Epanechnikov kernel and $\kappa_\lambda(\nu)=\tfrac{1}{2(\lambda-1)}\mathbbm{1}_{\mathcal{B}_{x_0}(\lambda)\setminus\mathcal{B}_{x_0}(1)}$ is a uniform kernel. The red points lie within the inner region $\mathcal{B}_{x_0}(h)$, which contribute to our estimating equation via the kernel $K_h(x-x_0)$. The blue points lie within the regions $\mathcal{B}_{x_0}(\lambda h)\setminus\mathcal{B}_{x_0}(h)$ contributing to our estimating equation via the kernel $\kappa_{h,\lambda}(x-x_0)$; the scaling $\mu(x_0)$ is such that the expectation of the function in the bottom diagram is zero. Each of $K$ and $\kappa_\lambda$ are kernels of order 2. The grey points do not contribute to the estimator at $x_0$.
  • Figure 3: Pointwise mean squared error $\mathbb{E}\bigl\{\bigl(\hat{f}(0)-f(0)\bigr)^2\bigr\}$ in the numerical experiments of Section \ref{['sec:sim-indeperrors']} over different bandwidths $h$.
  • Figure 4: Empirical and theoretical comparison of the ratio of the mean squared error (MSE) of the outrigger estimator with parameter $\lambda\in[\lambda_0(K),20]$ and the standard local polynomial estimator for the experiments of Sections \ref{['sec:sim-indeperrors']}\ref{['item:scale-mix']} and \ref{['sec:sim-indeperrors']}\ref{['item:loc-mix']}. The empirical curve plots $\lambda\mapsto\frac{\mathrm{MSE} \, \hat{f}^{\mathrm{Outrig}}(x_0)}{\mathrm{MSE} \, \hat{f}^{\mathrm{LP}}(x_0)}$, with MSEs estimated over 1000 repetitions. The theoretical MSE ratio anticipated by the bias and variance terms in Theorem \ref{['thm:decomp']} is $\lambda\mapsto\bigl({V_{P}^{(\lambda)}(x_0)}/{\sigma_P^2(x_0)}\bigr)^{2/3}$, with theoretical limit $\lim_{\lambda\to\infty}\bigl({V_{P}^{(\lambda)}(x_0)}/{\sigma_P^2(x_0)}\bigr)^{2/3}=\Bigl(\frac{1/i_P(x_0)}{\sigma_P^2(x_0)}\Bigr)^{2/3}$.
  • Figure 5: Surface plots of the conditional score functions $\rho(\varepsilon\,|\, x)$ for the three data generating mechanisms in Section \ref{['sec:numerics-nonindep-errors']}.
  • ...and 3 more figures

Theorems & Definitions (65)

  • Theorem 1
  • Corollary 2
  • Theorem 3
  • Theorem 4
  • Theorem 5
  • Theorem 6
  • Theorem 7
  • proof : Proof of Theorem \ref{['thm:decomp']}
  • Lemma 8
  • proof
  • ...and 55 more