Table of Contents
Fetching ...

An effective estimation of multivariate density functions using extended-beta kernels with Bayesian adaptive bandwidths

Sobom M. Somé, Célestin C. Kokonendji, Francial G. B. Libengué Dobélé-Kpoka

TL;DR

This paper develops a unified multivariate density estimator based on the multiple extended-beta kernel (MEBK) on compact supports, defining $\widehat{f}_{n}(\boldsymbol{x})=\frac{1}{n}\sum_{i=1}^{n}\prod_{j=1}^{d} EB_{x_j,h_j,a_j,b_j}(X_{ij})$ with $EB_{x,h,a,b}$ the univariate extended-beta kernel. It establishes bias, variance, and asymptotic normality under suitable smoothness and bandwidth conditions, and introduces a Bayesian adaptive bandwidth selector using independent inverse-gamma priors $IG(\alpha,\beta_\ell)$, yielding Bayes estimators $\widetilde{\boldsymbol{h}}_i=\mathbb{E}(\boldsymbol{h}_i|\mathbf{X}_i)$ and practical choices like $\alpha=n^{2/5}$ and $\beta_\ell=1$, along with an automatic support estimator $\widehat{\mathbb{T}}_d$. Through extensive simulations (univariate and multivariate) and real-data applications (cholesterol, Old Faithful, and student marks), the MEBK with Bayesian bandwidths demonstrates competitive or superior smoothing performance measured by ISE and log-likelihood compared to Gaussian and gamma kernels, especially near boundaries. The work provides explicit theoretical results, practical bandwidth rules, and empirical evidence of flexibility and universality in density estimation across bounded and unbounded domains, with potential for software implementation and future methodological extensions such as combined MEBK variants.

Abstract

Multivariate kernel density estimations have received much spate of interest. In addition to conventional methods of (non-)classical associated-kernels for (un)bounded densities and bandwidth selections, the multiple extended-beta kernel (MEBK) estimators with Bayesian adaptive bandwidths are invested to gain a deeper and better insight into the estimation of multivariate density functions. Being unimodal, the univariate extended-beta smoother has an adaptable compact support which is suitable for each dataset, always limited. The support of the density MBEK estimator can be known or estimated by extreme values. Thus, asymptotical properties for the (non-)normalized estimators are established. Explicit and general choices of bandwidths using the flexible Bayesian adaptive method are provided. Behavioural analyses, specifically undertaken on the sensitive edges of the estimator support, are studied and compared to Gaussian and gamma kernel estimators. Finally, simulation studies and three applications on original and usual real-data sets of the proposed method yielded very interesting advantages with respect to its flexibility as well as its universality.

An effective estimation of multivariate density functions using extended-beta kernels with Bayesian adaptive bandwidths

TL;DR

This paper develops a unified multivariate density estimator based on the multiple extended-beta kernel (MEBK) on compact supports, defining with the univariate extended-beta kernel. It establishes bias, variance, and asymptotic normality under suitable smoothness and bandwidth conditions, and introduces a Bayesian adaptive bandwidth selector using independent inverse-gamma priors , yielding Bayes estimators and practical choices like and , along with an automatic support estimator . Through extensive simulations (univariate and multivariate) and real-data applications (cholesterol, Old Faithful, and student marks), the MEBK with Bayesian bandwidths demonstrates competitive or superior smoothing performance measured by ISE and log-likelihood compared to Gaussian and gamma kernels, especially near boundaries. The work provides explicit theoretical results, practical bandwidth rules, and empirical evidence of flexibility and universality in density estimation across bounded and unbounded domains, with potential for software implementation and future methodological extensions such as combined MEBK variants.

Abstract

Multivariate kernel density estimations have received much spate of interest. In addition to conventional methods of (non-)classical associated-kernels for (un)bounded densities and bandwidth selections, the multiple extended-beta kernel (MEBK) estimators with Bayesian adaptive bandwidths are invested to gain a deeper and better insight into the estimation of multivariate density functions. Being unimodal, the univariate extended-beta smoother has an adaptable compact support which is suitable for each dataset, always limited. The support of the density MBEK estimator can be known or estimated by extreme values. Thus, asymptotical properties for the (non-)normalized estimators are established. Explicit and general choices of bandwidths using the flexible Bayesian adaptive method are provided. Behavioural analyses, specifically undertaken on the sensitive edges of the estimator support, are studied and compared to Gaussian and gamma kernel estimators. Finally, simulation studies and three applications on original and usual real-data sets of the proposed method yielded very interesting advantages with respect to its flexibility as well as its universality.

Paper Structure

This paper contains 10 sections, 7 theorems, 57 equations, 7 figures, 13 tables.

Key Result

Lemma 3.1

The MEBK $\;\prod_{j=1}^{d}{EB}_{x_j,h_j,a_j,b_j}(\cdot)$ from betaestimator to eq:BxH is such that Furthermore, for some $\alpha_j>0$ ($j=1,\ldots,d$) and any $\boldsymbol{x}=(x_{1},\ldots,x_{d})^{\top} \in\mathbb{T}_d=\times_{j=1}^{d}[a_{j}, b_{j}]$, one has:

Figures (7)

  • Figure 1: Histogram with its corresponding smoothings of the cholesterol data on $[1,2]$ from Table \ref{['univ_data']} using univariate Gaussian and extended-beta kernels with both cross-validation and Bayes selectors of bandwidths.
  • Figure 2: Shapes of univariate extended-beta kernels with different targets $x$ and same smoothing parameter $h=0.2$ (a); with same target $x=5$ and different smoothing parameters (b).
  • Figure 3: Plots of the ISE using extended-beta smoother \ref{['betaestimator']} in Scenario B with $n = 500$ and for different values of $\alpha$ and $\beta_\ell = \beta$ of prior distribution \ref{['prior']}.
  • Figure 4: True pdf and its corresponding smoothings using extended-beta kernels with UCV and Bayesian adaptive bandwidths for Scenarios A, B, C and D with $n=50$ (left); $n=200$ (right).
  • Figure 5: Histogram with its corresponding smoothings of the cholesterol data of Table \ref{['univ_data']} using univariate extended-beta kernels with both UCV and Bayes selectors of bandwidths for $\alpha=n^{2/5}$ and $\beta=0.5$ on: (a) $[a_1,b_1]=[1,2]$ and, (b) $[a_1,b_1]=[0.95,2.05]$.
  • ...and 2 more figures

Theorems & Definitions (16)

  • Lemma 3.1
  • Proposition 3.2
  • Proposition 3.3
  • Proposition 3.4
  • Proposition 3.5
  • Proposition 3.6
  • Theorem 4.1
  • Remark 4.2
  • Remark 4.3
  • proof : Proof of Lemma \ref{['lemma']}
  • ...and 6 more