Table of Contents
Fetching ...

A comparison of Dirichlet kernel regression methods on the simplex

Hanen Daayeb, Christian Genest, Salah Khardani, Nicolas Klutchnikoff, Frédéric Ouimet

TL;DR

The paper develops an asymmetric Dirichlet-kernel version of the Gasser--Müller estimator for nonparametric regression on the simplex, extending prior univariate results to multivariate simplex domains. It derives comprehensive asymptotic properties under fixed design (and discusses random design), including pointwise bias, variance, MISE, and asymptotic normality, with a careful treatment of boundary effects. A comparative simulation shows the Dirichlet local linear smoother generally outperforming its NW and GM counterparts, and the method is demonstrated on the GEMAS soil dataset, highlighting practical utility for relating soil composition to soil pH. Overall, the work advances boundary-corrected, adaptive kernel regression on the simplex and provides actionable guidance for bandwidth choice and method selection in applied settings.

Abstract

An asymmetric Dirichlet kernel version of the Gasser-Müller estimator is introduced for regression surfaces on the simplex, extending the univariate analog proposed by Chen [Statist. Sinica, 10(1) (2000), pp. 73-91]. Its asymptotic properties are investigated under the condition that the design points are known and fixed, including an analysis of its mean integrated squared error (MISE) and its asymptotic normality. The estimator is also applicable in a random design setting. A simulation study compares its performance with two recently proposed alternatives: the Nadaraya--Watson estimator with Dirichlet kernel and the local linear smoother with Dirichlet kernel. The results show that the local linear smoother consistently outperforms the others. To illustrate its applicability, the local linear smoother is applied to the GEMAS dataset to analyze the relationship between soil composition and pH levels across various agricultural and grazing lands in Europe.

A comparison of Dirichlet kernel regression methods on the simplex

TL;DR

The paper develops an asymmetric Dirichlet-kernel version of the Gasser--Müller estimator for nonparametric regression on the simplex, extending prior univariate results to multivariate simplex domains. It derives comprehensive asymptotic properties under fixed design (and discusses random design), including pointwise bias, variance, MISE, and asymptotic normality, with a careful treatment of boundary effects. A comparative simulation shows the Dirichlet local linear smoother generally outperforming its NW and GM counterparts, and the method is demonstrated on the GEMAS soil dataset, highlighting practical utility for relating soil composition to soil pH. Overall, the work advances boundary-corrected, adaptive kernel regression on the simplex and provides actionable guidance for bandwidth choice and method selection in applied settings.

Abstract

An asymmetric Dirichlet kernel version of the Gasser-Müller estimator is introduced for regression surfaces on the simplex, extending the univariate analog proposed by Chen [Statist. Sinica, 10(1) (2000), pp. 73-91]. Its asymptotic properties are investigated under the condition that the design points are known and fixed, including an analysis of its mean integrated squared error (MISE) and its asymptotic normality. The estimator is also applicable in a random design setting. A simulation study compares its performance with two recently proposed alternatives: the Nadaraya--Watson estimator with Dirichlet kernel and the local linear smoother with Dirichlet kernel. The results show that the local linear smoother consistently outperforms the others. To illustrate its applicability, the local linear smoother is applied to the GEMAS dataset to analyze the relationship between soil composition and pH levels across various agricultural and grazing lands in Europe.

Paper Structure

This paper contains 13 sections, 8 theorems, 73 equations, 3 figures, 1 table.

Key Result

Proposition 4.1

Suppose that Assumptions ass:1--ass:4 hold. Then, as $n\to \infty$ and uniformly for all $\boldsymbol{s}\in \mathcal{S}_d$, one has where the function $g$ is defined, for all $\boldsymbol{s}\in \mathcal{S}_d$, by

Figures (3)

  • Figure 3.1: The black dots represent the sequence of design points $\boldsymbol{x}_1,\dots,\boldsymbol{x}_n$ and is chosen here to form a grid of mesh size $\asymp n^{-1/2}$ over the two-dimensional simplex $\mathcal{S}_2$. The sequence $B_1,\dots,B_n$ is chosen to be the corresponding Voronoi diagram in the simplex, where each polygonal region $B_i$ around $\boldsymbol{x}_i$ corresponds to the Voronoi cell of $\boldsymbol{x}_i$. The same construction generalizes straightforwardly to higher dimensions.
  • Figure 6.1: Plot of leave-one-out cross-validation criterion as a function of the bandwidth for the GEMAS dataset.
  • Figure 6.2: Density plot of the estimated pH in CaCl2 as a function of the proportion of sand and silt.

Theorems & Definitions (13)

  • Proposition 4.1: Pointwise bias
  • Proposition 4.2: Pointwise variance
  • Remark 1
  • Corollary 4.3: Mean squared error
  • Remark 2
  • Theorem 4.4: Mean integrated squared error
  • Theorem 4.5: Asymptotic normality
  • Remark 3
  • Remark 4
  • Remark 5
  • ...and 3 more