Dirichlet kernel density estimation on the simplex with missing data

Hanen Daayeb; Wissem Jedidi; Salah Khardani; Guanjie Lyu; Frédéric Ouimet

Dirichlet kernel density estimation on the simplex with missing data

Hanen Daayeb, Wissem Jedidi, Salah Khardani, Guanjie Lyu, Frédéric Ouimet

Abstract

Nonparametric density estimation for compositional data supported on the simplex is examined under a missing at random mechanism. Rather than imputing missing values and estimating the density from a completed data set, we adopt a strategy based on inverse probability weighting. The proposed estimator uses an adaptive Dirichlet kernel, which ensures nonnegativity on the simplex and favorable behavior near the boundary. When the observation probabilities are unknown, they are estimated through a Nadaraya-Watson regression step. The large-sample properties of the estimator are derived, including pointwise bias and variance expansions, optimal smoothing rates, and asymptotic normality. A simulation study investigates its finite-sample performance under varying sample sizes and missing rates. Simulations show our method outperforms inverse-probability-weighted kernel density estimators based on additive and isometric log-ratio transformations of the data for certain target densities. The methodology is further illustrated through an application to leukocyte composition data from the National Health and Nutrition Examination Survey (NHANES), which allows for the identification of the modal immune profile in the sampled population.

Dirichlet kernel density estimation on the simplex with missing data

Abstract

Paper Structure (26 sections, 10 theorems, 135 equations, 9 figures, 2 tables)

This paper contains 26 sections, 10 theorems, 135 equations, 9 figures, 2 tables.

Introduction
Definitions and notation
Assumptions
Main results
Results for the pseudo estimator widetilde(f)_n,b
Results for the feasible estimator hat(f)_n,b
Simulation results
Models and setup
Bandwidth selection
Performance evaluation
Comparative study
Impact of the sample size
Impact of the missing rate
Joint effect of sample size and missing rate
Real-data application
...and 11 more sections

Key Result

Proposition 4.1

Suppose that Assumptions ass:1, ass:3, and ass:5 hold. Uniformly for $\boldsymbol{s}\in \mathcal{S}_d$, we have where

Figures (9)

Figure 1: Visualization of the MAR mechanisms for Model I (left panel) and Model II (right panel).
Figure 2: Contour plots of the Model I target density $f$ (left panel) and the associated Dirichlet kernel density estimate $\hat{f}_{n,0.05}$ (right panel), with a sample size $n = 2000$ and a $10\%$ missing rate.
Figure 3: Contour plots of the Model II target density $f$ (left panel) and the associated Dirichlet kernel density estimate $\hat{f}_{n,0.05}$ (right panel), with a sample size $n = 2000$ and a $10\%$ missing rate.
Figure 4: Mean, median, standard deviation, and interquartile range of $1000$ ISEs in Model I for the IPW Dirichlet KDE as a function of the proportion of missing data, shown for four sample sizes $n\in \{100, 200, 400, 800\}$.
Figure 5: Mean, median, standard deviation, and interquartile range of $1000$ ISEs in Model II for the IPW Dirichlet KDE as a function of the proportion of missing data, shown for four sample sizes $n\in \{100, 200, 400, 800\}$.
...and 4 more figures

Theorems & Definitions (13)

Proposition 4.1: Pointwise bias
Proposition 4.2: Pointwise variance
Corollary 4.3: Mean squared error
Theorem 4.4: Asymptotic normality
Proposition 4.5: Pointwise bias
Proposition 4.6: Pointwise variance
Corollary 4.7: Mean squared error
Theorem 4.8: Asymptotic normality
Remark 1
Lemma 9.1
...and 3 more

Dirichlet kernel density estimation on the simplex with missing data

Abstract

Dirichlet kernel density estimation on the simplex with missing data

Authors

Abstract

Table of Contents

Key Result

Figures (9)

Theorems & Definitions (13)