Table of Contents
Fetching ...

Consistency of the $k$-Nearest Neighbor Regressor under Complex Survey Designs

Caren Hasler

Abstract

We study the consistency of the $k$-nearest neighbor regressor under complex survey designs. While consistency results for this algorithm are well established for independent and identically distributed data, corresponding results for complex survey data are lacking. We show that the $k$-nearest neighbor regressor is consistent under regularity conditions on the sampling design and the distribution of the data. We derive lower bounds for the rate of convergence and show that these bounds exhibit the curse of dimensionality, as in the independent and identically distributed setting. Empirical studies based on simulated and real data illustrate our theoretical findings.

Consistency of the $k$-Nearest Neighbor Regressor under Complex Survey Designs

Abstract

We study the consistency of the -nearest neighbor regressor under complex survey designs. While consistency results for this algorithm are well established for independent and identically distributed data, corresponding results for complex survey data are lacking. We show that the -nearest neighbor regressor is consistent under regularity conditions on the sampling design and the distribution of the data. We derive lower bounds for the rate of convergence and show that these bounds exhibit the curse of dimensionality, as in the independent and identically distributed setting. Empirical studies based on simulated and real data illustrate our theoretical findings.
Paper Structure (11 sections, 5 theorems, 44 equations, 5 figures)

This paper contains 11 sections, 5 theorems, 44 equations, 5 figures.

Key Result

Proposition 1

Suppose that Conditions (Ccondition:y:bounded) to (Ccondition:kn:n) hold. The sample estimator $\widehat{m}_n({\mathbf{x}})$ is $L^2$-consistent for $m({\mathbf{x}})$ and satisfies with $\xi$-probability one.

Figures (5)

  • Figure 1: Left panel: 15 population units (filled and empty dots), six sample units (filled dots), closed ball of radius $\rho_{4U}({\mathbf{x}})$ centered at ${\mathbf{x}}$ ($B\left({\mathbf{x}}, \rho_{4U}({\mathbf{x}})\right)$, solid circle), and closed ball of radius $\rho_{4S}({\mathbf{x}})$ centered at ${\mathbf{x}}$ ($B\left({\mathbf{x}}, \rho_{4S}({\mathbf{x}})\right)$, dashed circle). Right panel: six sample units (filled dots), partition of $\mathbb{R}^2$ obtained from the 4-nearest neighbors applied to sample units (polygons), labels of four nearest sample units of any point in a polygon (four-figure numbers within the polygons).
  • Figure 2: Simulated data: maximum value of the ratio of the overall sampling fraction $f$ to the sampling fraction $f(x^o_i)$ in the ball of radius $\rho_{k_nS}(x^o_i)$ centered at $x^o_i$ over a sequence of values ${\mathbf{x}}^o$ in the support of the covariate. Different population sizes $N$ and sampling designs are considered. The sampling designs are proportional to size sampling (pps), simple random sampling without replacement (srswor), and stratified sampling (stratified).
  • Figure 3: Simulated data: average value of $|r_{ij}|$ for different population sizes N.
  • Figure 4: Simulated data: boxplots of the value of the MSE of the sample $k_n$-nearest neighbors estimator $\widehat{m}_n(x^o_i)$, $i = 1, \ldots, 10$ for nine populations of respective size N. The dashed line is the value of $\frac{1}{k_n} + \frac{k_n}{n}$ for the nine populations (multiplied by 2.2 for graphical reasons).
  • Figure 5: White Wine data: boxplots of the value of the MSE of the sample $k_n$-nearest neighbors estimator $\widehat{m}_n({\mathbf{x}}^o_i)$, $i = 1, \ldots, 100$ for five populations of respective size N. The dashed line is the value of $\frac{1}{k_n} + \left(\frac{k_n}{n}\right)^{2/10}$ for the five populations (multiplied by 4.5 for graphical reasons).

Theorems & Definitions (9)

  • Proposition 1
  • proof
  • Corollary 5.1
  • Proposition 2
  • proof
  • Proposition 3
  • Lemma 1
  • proof
  • proof : Proof of Proposition \ref{['proposition:knn:design:consistency']}