Table of Contents
Fetching ...

Empirical Density Estimation based on Spline Quasi-Interpolation with applications to Copulas clustering modeling

Cristiano Tamborrino, Antonella Falini, Francesca Mazzia

TL;DR

The mono-variate approximation of the density using spline quasi interpolation is proposed and applied in the context of clustering modeling and a finite mixture copula model is proposed.

Abstract

Density estimation is a fundamental technique employed in various fields to model and to understand the underlying distribution of data. The primary objective of density estimation is to estimate the probability density function of a random variable. This process is particularly valuable when dealing with univariate or multivariate data and is essential for tasks such as clustering, anomaly detection, and generative modeling. In this paper we propose the mono-variate approximation of the density using spline quasi interpolation and we applied it in the context of clustering modeling. The clustering technique used is based on the construction of suitable multivariate distributions which rely on the estimation of the monovariate empirical densities (marginals). Such an approximation is achieved by using the proposed spline quasi-interpolation, while the joint distributions to model the sought clustering partition is constructed with the use of copulas functions. In particular, since copulas can capture the dependence between the features of the data independently from the marginal distributions, a finite mixture copula model is proposed. The presented algorithm is validated on artificial and real datasets.

Empirical Density Estimation based on Spline Quasi-Interpolation with applications to Copulas clustering modeling

TL;DR

The mono-variate approximation of the density using spline quasi interpolation is proposed and applied in the context of clustering modeling and a finite mixture copula model is proposed.

Abstract

Density estimation is a fundamental technique employed in various fields to model and to understand the underlying distribution of data. The primary objective of density estimation is to estimate the probability density function of a random variable. This process is particularly valuable when dealing with univariate or multivariate data and is essential for tasks such as clustering, anomaly detection, and generative modeling. In this paper we propose the mono-variate approximation of the density using spline quasi interpolation and we applied it in the context of clustering modeling. The clustering technique used is based on the construction of suitable multivariate distributions which rely on the estimation of the monovariate empirical densities (marginals). Such an approximation is achieved by using the proposed spline quasi-interpolation, while the joint distributions to model the sought clustering partition is constructed with the use of copulas functions. In particular, since copulas can capture the dependence between the features of the data independently from the marginal distributions, a finite mixture copula model is proposed. The presented algorithm is validated on artificial and real datasets.
Paper Structure (16 sections, 5 theorems, 52 equations, 9 figures, 14 tables)

This paper contains 16 sections, 5 theorems, 52 equations, 9 figures, 14 tables.

Key Result

Theorem 2.1

The function $\hat{f}$, BSHQI estimation of $f$ in a given interval $[a,b]$, with $\lambda_j$ as defined in coeff_equation, is a density function. In particular:

Figures (9)

  • Figure 1: Comparison of samples generated from $X\sim \mathcal{N}(5,0.3)$ with the KDEpy and BSHQI method for probability density (a) and for the cumulative distribution (b).
  • Figure 2: Comparison of samples generated with the KDE and BSHQI method for probability density (a) and for the cumulative distribution (b)
  • Figure 3: Comparison of samples generated with the KDE and BSHQI method for probability density (a) and for the cumulative distribution (b).
  • Figure 4: Synthetic dataset: (a) Ground truth $\mathcal{X}_1$, (b) Ground truth $\mathcal{X}_2$, (c) Ground truth $\mathcal{X}_3$, (d) Ground truth $\mathcal{X}_4$, (e) Pairwise ground truth $\mathcal{X}_4$
  • Figure 5: Synthetic dataset $\mathcal{X}_1$: (a) GMM, (b) CopMixM_BSHQI, (c) CopMixM_KDEpy
  • ...and 4 more figures

Theorems & Definitions (8)

  • Theorem 2.1
  • Lemma 2.2
  • Theorem 2.3
  • proof
  • Corollary 2.4
  • Definition 3.1
  • Theorem 3.2
  • Definition 3.3: Semiparametric approach