Table of Contents
Fetching ...

Kernel density estimation with polyspherical data and its applications

Eduardo García-Portugués, Andrea Meilán-Vila

Abstract

A kernel density estimator for data on the polysphere $\mathbb{S}^{d_1}\times\cdots\times\mathbb{S}^{d_r}$, with $r,d_1,\ldots,d_r\geq 1$, is presented in this paper. We derive the main asymptotic properties of the estimator, including mean square error, normality, and optimal bandwidths. We address the kernel theory of the estimator beyond the von Mises-Fisher kernel, introducing new kernels that are more efficient and investigating normalizing constants, moments, and sampling methods thereof. Plug-in and cross-validated bandwidth selectors are also obtained. As a spin-off of the kernel density estimator, we propose a nonparametric $k$-sample test based on the Jensen-Shannon divergence. Numerical experiments illuminate the asymptotic theory of the kernel density estimator and demonstrate the superior performance of the $k$-sample test with respect to parametric alternatives in certain scenarios. Our smoothing methodology is applied to the analysis of the morphology of a sample of hippocampi of infants embedded on the high-dimensional polysphere $(\mathbb{S}^2)^{168}$ via skeletal representations ($s$-reps).

Kernel density estimation with polyspherical data and its applications

Abstract

A kernel density estimator for data on the polysphere , with , is presented in this paper. We derive the main asymptotic properties of the estimator, including mean square error, normality, and optimal bandwidths. We address the kernel theory of the estimator beyond the von Mises-Fisher kernel, introducing new kernels that are more efficient and investigating normalizing constants, moments, and sampling methods thereof. Plug-in and cross-validated bandwidth selectors are also obtained. As a spin-off of the kernel density estimator, we propose a nonparametric -sample test based on the Jensen-Shannon divergence. Numerical experiments illuminate the asymptotic theory of the kernel density estimator and demonstrate the superior performance of the -sample test with respect to parametric alternatives in certain scenarios. Our smoothing methodology is applied to the analysis of the morphology of a sample of hippocampi of infants embedded on the high-dimensional polysphere via skeletal representations (-reps).

Paper Structure

This paper contains 22 sections, 16 theorems, 143 equations, 13 figures, 1 table.

Key Result

Theorem 3.1

Under A1--A3, for $\boldsymbol{x}\in\mathbb{S}^{\boldsymbol{d}}$, where with $\lambda_{\boldsymbol{d}}(L(\boldsymbol{s})s_j)\equiv\omega_{\boldsymbol{d}-1}\int_{\mathbb{R}_+^r}L\left(\boldsymbol{s}\right) s_j \prod_{j=1}^r s_j^{d_j/2-1} 2^{d_j/2-1}\,\mathrm{d} \boldsymbol{s}$ and $\boldsymbol{\mathcal{H}}_{jj}\bar{f}(\boldsymbol{x})$ being the $(d_j+1)\times(d_j+1)$ mar

Figures (13)

  • Figure 1: Kernels and their efficiencies. Figure \ref{['fig:effic:a']} shows the approximation of the Epa kernel by the sfp kernel as $\upsilon$ increases. Figure \ref{['fig:effic:b']} displays the efficiencies of the vMF and sfp kernels with respect to the $\mathrm{Epa}^S$ kernel for a fixed $r=1$. Figure \ref{['fig:effic:c']} does the same, but now for a fixed $d=2$.
  • Figure 2: Construction of an $s$-rep and scatterplots of directions of different spokes. Figure \ref{['fig:srepa']} sketches the $s$-rep fitting process liu2021fitting, where a hippocampus surface (blue, top) is parametrized as a set of base skeletal points (red) and a collection of $168$ spokes (green segments) connecting with the boundary (blue points). Figure \ref{['fig:srepb']} shows the $\mathbb{S}^2$-directions of the $n=177$ spokes located at boundary points $88$, $122$, $146$, and $157$ (from left to right and top to bottom). The locations of these spokes are shown in Figure \ref{['fig:srepa']}.
  • Figure 3: Dataset of $n=177$ hippocampi ranked by the density of their shapes, as determined by \ref{['eq:rank']} using $\hat{{\boldsymbol{h}}}_{\mathrm{ROT}}$ and the sfp kernel with $\upsilon=100$. The yellow--violet color gradient indicates the inward--outward ranking. The red dots indicate whether the 6-month-old infant later developed autism (see Section \ref{['sec:classes']}).
  • Figure 4: $p$-values of the test based on $T_{n,\mathrm{JSD}}(c\times\hat{{\boldsymbol{h}}}_\mathrm{ROT})$ as a function of the factor $c$. The sfp kernel with $\upsilon=100$ is used in the test.
  • Figure 5: Density of $\mathcal{N}(0,1)$ (black curves) and kdes of $M=10^5$ observations of ${Z_{n,\delta}^{(2)}}$, for $n=2^\ell$, $\ell=7,8,\ldots,17$ (curves in color gradient from dark violet to yellow) and $\delta=-2,-1,0,1,2,4$.
  • ...and 8 more figures

Theorems & Definitions (39)

  • Theorem 3.1: Asymptotic bias and variance
  • Corollary 3.1: Improved asymptotic bias expansion
  • Remark 3.1
  • Corollary 3.2: Strong pointwise consistency
  • Corollary 3.3: AMISE optimal bandwidth
  • Theorem 3.2: Pointwise asymptotic normality
  • Remark 3.2
  • Remark 3.3
  • Proposition 4.1: Normalizing constant for the Epa kernel
  • Proposition 4.2: Angular cdf of the Epa kernel
  • ...and 29 more