Table of Contents
Fetching ...

Dirichlet kernel density estimation for strongly mixing sequences on the simplex

Hanen Daayeb, Salah Khardani, Frédéric Ouimet

TL;DR

This paper develops a Dirichlet kernel density estimator for compositional data on the simplex under strong mixing, extending previous iid results to time dependent sequences. It provides a precise MSE expansion with a bias term of order $b^2$ and a variance term of order $n^{-1} b^{-d/2}$, along with a CLT showing $n^{1/2} b^{d/4} (\hat f_{n,b}(s) - f(s)) \to N(0, \psi(s) f(s))$ in the interior of the simplex, enabling plug-in confidence intervals. A real-data Renault market-share illustration demonstrates bandwidth selection via leave-one-out cross-validation and the computation of 95% HDRs for tri-variate compositions, highlighting practical utility. The discussion points to potential extensions to Aitchison geometry and spatial settings, and to further work on uniform convergence and mode/HDR inference for dependent compositional data.

Abstract

This paper investigates the theoretical properties of Dirichlet kernel density estimators for compositional data supported on simplices, for the first time addressing scenarios involving time-dependent observations characterized by strong mixing conditions. We establish rigorous results for the asymptotic normality and mean squared error of these estimators, extending previous findings from the independent and identically distributed (iid) context to the more general setting of strongly mixing processes. To demonstrate its practical utility, the estimator is applied to monthly market-share compositions of several Renault vehicle classes over a twelve-year period, with bandwidth selection performed via leave-one-out least squares cross-validation. Our findings underscore the reliability and strength of Dirichlet kernel techniques when applied to temporally dependent compositional data.

Dirichlet kernel density estimation for strongly mixing sequences on the simplex

TL;DR

This paper develops a Dirichlet kernel density estimator for compositional data on the simplex under strong mixing, extending previous iid results to time dependent sequences. It provides a precise MSE expansion with a bias term of order and a variance term of order , along with a CLT showing in the interior of the simplex, enabling plug-in confidence intervals. A real-data Renault market-share illustration demonstrates bandwidth selection via leave-one-out cross-validation and the computation of 95% HDRs for tri-variate compositions, highlighting practical utility. The discussion points to potential extensions to Aitchison geometry and spatial settings, and to further work on uniform convergence and mode/HDR inference for dependent compositional data.

Abstract

This paper investigates the theoretical properties of Dirichlet kernel density estimators for compositional data supported on simplices, for the first time addressing scenarios involving time-dependent observations characterized by strong mixing conditions. We establish rigorous results for the asymptotic normality and mean squared error of these estimators, extending previous findings from the independent and identically distributed (iid) context to the more general setting of strongly mixing processes. To demonstrate its practical utility, the estimator is applied to monthly market-share compositions of several Renault vehicle classes over a twelve-year period, with bandwidth selection performed via leave-one-out least squares cross-validation. Our findings underscore the reliability and strength of Dirichlet kernel techniques when applied to temporally dependent compositional data.

Paper Structure

This paper contains 9 sections, 4 theorems, 46 equations, 1 figure, 1 table.

Key Result

Theorem 1

Suppose that Assumptions ass:1--ass:3 hold. For any $\boldsymbol{s}\in \mathrm{Int}(\mathcal{S}_d)$, we have, as $n\to \infty$, where

Figures (1)

  • Figure 1: Kernel density estimates for each three-component composition $(S_i,S_j,\text{other})$ on the simplex, displayed with a common color scale (darker shades represent lower density). The red closed curve marks the boundary of the $95\%$ highest-density region (HDR).

Theorems & Definitions (7)

  • Remark 1
  • Theorem 1: Mean squared error
  • Theorem 2: Asymptotic normality and plug-in confidence interval
  • Lemma 1
  • proof
  • Lemma 2
  • proof