Table of Contents
Fetching ...

Nonparametric Regression for Random Unbiased Perturbations

Anna Lyubarskaja, Dominik Rothenhäusler

TL;DR

This work introduces random unbiased perturbations (RUPs) as dataset-level, mean-zero shifts in the conditional law $Y|X$ with fixed covariate distribution $P^X$, distinct from adversarial or per-sample shifts. It derives an extended bias–variance decomposition that adds a distributional variance term, and shows that RUPs effectively reduce the information available to an estimator through an effective sample size $n_{\mathrm{eff}} = n/(1+n\tau)$, where $\tau$ captures perturbation strength and correlation. For local polynomial estimators, the paper establishes that optimal bandwidth scales with $h^\star \asymp (\tfrac{1}{n}+\tau)^{1/(2\beta+1)}$, leading to convergence rates in terms of $n_{\mathrm{eff}}$ and, in certain regimes, to perturbation-dominated scaling $h^\star \propto \tau^{1/(2\beta+1)}$. Minimax lower bounds show these rates are fundamental under RUPs, illustrating a new uncertainty regime that shapes both tuning and limits. The results offer practical guidance for bandwidth selection and evaluation under distributional randomness and open avenues for extending the RUP framework to other nonparametric tools and problems.

Abstract

We study nonparametric regression with covariates $X$ and outcome $Y$ under random unbiased perturbations (RUPs) of the conditional distribution $Y|X$, where the marginal distribution of covariates, $P^X$, remains fixed but the conditional law, $P^{Y|X}$, varies randomly across datasets. Unlike adversarial distribution shift frameworks that yield conservative worst-case guarantees, RUPs induce dataset-level variance inflation rather than systematic bias. We provide examples of RUPs and show that this distributional uncertainty reduces the effective sample size to $n_{\mathrm{eff}} = n/(1 + n τ)$, where $τ\in [0,1]$ quantifies the perturbation strength. For local polynomial estimators, we derive an extended bias-variance decomposition that includes a distributional variance term with the same bandwidth scaling as classical sampling variance. This leads to a modified bandwidth selection principle: when distributional uncertainty dominates sampling uncertainty ($τ\gg 1/n$), optimal bandwidths scale as $τ^{1/(2β+1)}$ rather than the usual $n^{-1/(2β+1)}$, where $β$ indicates the smoothness of the function class considered. We also establish matching minimax lower bounds showing that there exists an RUP for which this effective sample size $n_{\mathrm{eff}}$ is fundamental. Our results demonstrate that random dataset-level perturbations create a distinct mode of uncertainty that affects both practical tuning and fundamental statistical limits.

Nonparametric Regression for Random Unbiased Perturbations

TL;DR

This work introduces random unbiased perturbations (RUPs) as dataset-level, mean-zero shifts in the conditional law with fixed covariate distribution , distinct from adversarial or per-sample shifts. It derives an extended bias–variance decomposition that adds a distributional variance term, and shows that RUPs effectively reduce the information available to an estimator through an effective sample size , where captures perturbation strength and correlation. For local polynomial estimators, the paper establishes that optimal bandwidth scales with , leading to convergence rates in terms of and, in certain regimes, to perturbation-dominated scaling . Minimax lower bounds show these rates are fundamental under RUPs, illustrating a new uncertainty regime that shapes both tuning and limits. The results offer practical guidance for bandwidth selection and evaluation under distributional randomness and open avenues for extending the RUP framework to other nonparametric tools and problems.

Abstract

We study nonparametric regression with covariates and outcome under random unbiased perturbations (RUPs) of the conditional distribution , where the marginal distribution of covariates, , remains fixed but the conditional law, , varies randomly across datasets. Unlike adversarial distribution shift frameworks that yield conservative worst-case guarantees, RUPs induce dataset-level variance inflation rather than systematic bias. We provide examples of RUPs and show that this distributional uncertainty reduces the effective sample size to , where quantifies the perturbation strength. For local polynomial estimators, we derive an extended bias-variance decomposition that includes a distributional variance term with the same bandwidth scaling as classical sampling variance. This leads to a modified bandwidth selection principle: when distributional uncertainty dominates sampling uncertainty (), optimal bandwidths scale as rather than the usual , where indicates the smoothness of the function class considered. We also establish matching minimax lower bounds showing that there exists an RUP for which this effective sample size is fundamental. Our results demonstrate that random dataset-level perturbations create a distinct mode of uncertainty that affects both practical tuning and fundamental statistical limits.

Paper Structure

This paper contains 25 sections, 8 theorems, 51 equations, 6 figures.

Key Result

Proposition 2.1

Let $\Xi$ be the law of iid positive weights $\{\xi_{ij}\}_{i \in [B_X], j \in [B_{\varepsilon}]}$, and let $\mathbb{E}[\xi^{-3}] < \infty$ (e.g. $\xi \sim \mathrm{Exp}(1)$). The partition model produces a random unbiased perturbation of $P_0$, $(\Xi, (P_{\xi})_\xi)$ with: where $I_x$ represents the $P_0^X$-quantile containing $x$.

Figures (6)

  • Figure 1: Examples of points sampled adversarially (left, red) and from an RUP (right, blue). The two distribution shifts have the same KL divergence $\kappa$ from the baseline distribution $P_0$ (sampled in gray). However, the adversarial conditional mean (left, red) provides a much poorer approximation of the target $P_0$ conditional mean (black, dashed), whereas the RUP conditional mean (right, blue) remains much closer to the true relationship. The shaded gray divergence ball illustrates the range of conditional means that distributions at KL divergence $\kappa$ from $P_0$ can take.
  • Figure 2: Schematic bias–variance tradeoff under random perturbations. Bias decreases with model complexity, while sampling variance and distributional variance both increase. The additional distributional variance raises the total RUP-adjusted curve and shifts the optimal complexity leftward (toward more regularized models).
  • Figure 3: The partition model. The original density $P_0$ (shown in (a)), is split into $X$ and $\varepsilon$ quantiles which are each assigned i.i.d. (normalized) random weights (shown in (b)). The resulting RUP (shown in (c)) is the product of $P_0$ and these weights.
  • Figure 4: The correlated noise model. The original density $P_0$ (shown in (a)), is split into $X$-quantiles which are each assigned i.i.d. random shifts (shown in (b)). The resulting RUP (shown in (c)) is the original $P_0$ and shifted by (b).
  • Figure 5: Simulation illustrating the predicted behavior from \ref{['fig:bv_shift']}. Under i.i.d. sampling assumptions, a model would choose a smaller bandwidth and hence a higher complexity model. As random distribution shift is introduced, the optimal model complexity decreases to account for the additional variance.
  • ...and 1 more figures

Theorems & Definitions (18)

  • Definition 1: Random unbiased perturbation of $P_0$
  • Remark 2.1: On the correlation kernel
  • Remark 2.2
  • Proposition 2.1
  • Proposition 2.2
  • Proposition 3.1: based on Tsybakov tsybakov_introduction_2009, Proposition 1.13
  • Theorem 3.1
  • Corollary 3.1
  • Remark 4.1
  • Theorem 4.1
  • ...and 8 more