Table of Contents
Fetching ...

Scalable Uncertainty Quantification for Black-Box Density-Based Clustering

Nicola Bariletto, Stephen G. Walker

TL;DR

A novel framework for uncertainty quantification in clustering is introduced by combining the martingale posterior paradigm with density-based clustering, where uncertainty in the estimated density is naturally propagated to the clustering structure.

Abstract

We introduce a novel framework for uncertainty quantification in clustering. By combining the martingale posterior paradigm with density-based clustering, uncertainty in the estimated density is naturally propagated to the clustering structure. The approach scales effectively to high-dimensional and irregularly shaped data by leveraging modern neural density estimators and GPU-friendly parallel computation. We establish frequentist consistency guarantees and validate the methodology on synthetic and real data.

Scalable Uncertainty Quantification for Black-Box Density-Based Clustering

TL;DR

A novel framework for uncertainty quantification in clustering is introduced by combining the martingale posterior paradigm with density-based clustering, where uncertainty in the estimated density is naturally propagated to the clustering structure.

Abstract

We introduce a novel framework for uncertainty quantification in clustering. By combining the martingale posterior paradigm with density-based clustering, uncertainty in the estimated density is naturally propagated to the clustering structure. The approach scales effectively to high-dimensional and irregularly shaped data by leveraging modern neural density estimators and GPU-friendly parallel computation. We establish frequentist consistency guarantees and validate the methodology on synthetic and real data.
Paper Structure (31 sections, 3 theorems, 16 equations, 3 figures, 1 algorithm)

This paper contains 31 sections, 3 theorems, 16 equations, 3 figures, 1 algorithm.

Key Result

Theorem 1

Assume that Then $\theta_{n,\infty}:=\lim_{N\to\infty}\theta_{n,N}$ is a well-defined random variable on $(\Omega,\mathcal{F},\mathbb P)$.

Figures (3)

  • Figure 1: Illustration of DBC. The plotted density has two clusters, labeled $C_1$ and $C_2$, corresponding to the two connected components of the upper-level set at level $t$.
  • Figure 2: Noisy concentric circles experiment.
  • Figure 3: MNIST digits experiment.

Theorems & Definitions (4)

  • Theorem 1
  • Theorem 2
  • Theorem 3
  • Definition 1