A comparison of Dirichlet kernel regression methods on the simplex
Hanen Daayeb, Christian Genest, Salah Khardani, Nicolas Klutchnikoff, Frédéric Ouimet
TL;DR
The paper develops an asymmetric Dirichlet-kernel version of the Gasser--Müller estimator for nonparametric regression on the simplex, extending prior univariate results to multivariate simplex domains. It derives comprehensive asymptotic properties under fixed design (and discusses random design), including pointwise bias, variance, MISE, and asymptotic normality, with a careful treatment of boundary effects. A comparative simulation shows the Dirichlet local linear smoother generally outperforming its NW and GM counterparts, and the method is demonstrated on the GEMAS soil dataset, highlighting practical utility for relating soil composition to soil pH. Overall, the work advances boundary-corrected, adaptive kernel regression on the simplex and provides actionable guidance for bandwidth choice and method selection in applied settings.
Abstract
An asymmetric Dirichlet kernel version of the Gasser-Müller estimator is introduced for regression surfaces on the simplex, extending the univariate analog proposed by Chen [Statist. Sinica, 10(1) (2000), pp. 73-91]. Its asymptotic properties are investigated under the condition that the design points are known and fixed, including an analysis of its mean integrated squared error (MISE) and its asymptotic normality. The estimator is also applicable in a random design setting. A simulation study compares its performance with two recently proposed alternatives: the Nadaraya--Watson estimator with Dirichlet kernel and the local linear smoother with Dirichlet kernel. The results show that the local linear smoother consistently outperforms the others. To illustrate its applicability, the local linear smoother is applied to the GEMAS dataset to analyze the relationship between soil composition and pH levels across various agricultural and grazing lands in Europe.
