Empirical Density Estimation based on Spline Quasi-Interpolation with applications to Copulas clustering modeling

Cristiano Tamborrino; Antonella Falini; Francesca Mazzia

Empirical Density Estimation based on Spline Quasi-Interpolation with applications to Copulas clustering modeling

Cristiano Tamborrino, Antonella Falini, Francesca Mazzia

TL;DR

The mono-variate approximation of the density using spline quasi interpolation is proposed and applied in the context of clustering modeling and a finite mixture copula model is proposed.

Abstract

Density estimation is a fundamental technique employed in various fields to model and to understand the underlying distribution of data. The primary objective of density estimation is to estimate the probability density function of a random variable. This process is particularly valuable when dealing with univariate or multivariate data and is essential for tasks such as clustering, anomaly detection, and generative modeling. In this paper we propose the mono-variate approximation of the density using spline quasi interpolation and we applied it in the context of clustering modeling. The clustering technique used is based on the construction of suitable multivariate distributions which rely on the estimation of the monovariate empirical densities (marginals). Such an approximation is achieved by using the proposed spline quasi-interpolation, while the joint distributions to model the sought clustering partition is constructed with the use of copulas functions. In particular, since copulas can capture the dependence between the features of the data independently from the marginal distributions, a finite mixture copula model is proposed. The presented algorithm is validated on artificial and real datasets.

Empirical Density Estimation based on Spline Quasi-Interpolation with applications to Copulas clustering modeling

TL;DR

The mono-variate approximation of the density using spline quasi interpolation is proposed and applied in the context of clustering modeling and a finite mixture copula model is proposed.

Abstract

Paper Structure (16 sections, 5 theorems, 52 equations, 9 figures, 14 tables)

This paper contains 16 sections, 5 theorems, 52 equations, 9 figures, 14 tables.

Introduction
BSHQI density estimation
Statistical Tests for marginals fitting with BSHQI spline
Copulas Mixture Model
Expectation-Maximization for Copula Mixture Model
Maximization step:
Initialization
Expectation
Maximization
Experiments
Synthetic Dataset
Real Datasets
AIS
Breast Cancer Wisconsin (Diagnostic)
Text Clustering
...and 1 more sections

Key Result

Theorem 2.1

The function $\hat{f}$, BSHQI estimation of $f$ in a given interval $[a,b]$, with $\lambda_j$ as defined in coeff_equation, is a density function. In particular:

Figures (9)

Figure 1: Comparison of samples generated from $X\sim \mathcal{N}(5,0.3)$ with the KDEpy and BSHQI method for probability density (a) and for the cumulative distribution (b).
Figure 2: Comparison of samples generated with the KDE and BSHQI method for probability density (a) and for the cumulative distribution (b)
Figure 3: Comparison of samples generated with the KDE and BSHQI method for probability density (a) and for the cumulative distribution (b).
Figure 4: Synthetic dataset: (a) Ground truth $\mathcal{X}_1$, (b) Ground truth $\mathcal{X}_2$, (c) Ground truth $\mathcal{X}_3$, (d) Ground truth $\mathcal{X}_4$, (e) Pairwise ground truth $\mathcal{X}_4$
Figure 5: Synthetic dataset $\mathcal{X}_1$: (a) GMM, (b) CopMixM_BSHQI, (c) CopMixM_KDEpy
...and 4 more figures

Theorems & Definitions (8)

Theorem 2.1
Lemma 2.2
Theorem 2.3
proof
Corollary 2.4
Definition 3.1
Theorem 3.2
Definition 3.3: Semiparametric approach

Empirical Density Estimation based on Spline Quasi-Interpolation with applications to Copulas clustering modeling

TL;DR

Abstract

Empirical Density Estimation based on Spline Quasi-Interpolation with applications to Copulas clustering modeling

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (9)

Theorems & Definitions (8)