Table of Contents
Fetching ...

DelTriC: A Novel Clustering Method with Accurate Outlier

Tomas Javurek, Michal Gregor, Sebastian Kula, Marian Simko

TL;DR

DelTriC tackles clustering in high-dimensional data with irregular shapes and simultaneous outlier detection. It projects data to a 2D proxy, builds a Delaunay triangulation, and back-projects to the original space to prune edges and merge clusters, aiming to preserve high-dimensional geometry. Key contributions include a sigma-based pruning framework with MAD normalization, a conservative centroid-based merging step, and a back-projection mechanism that improves anomaly detection. Empirical results on synthetic and real-world datasets show competitive clustering performance with stronger anomaly detection and favorable scalability compared to DBSCAN/HDBSCAN.

Abstract

The paper introduces DelTriC (Delaunay Triangulation Clustering), a clustering algorithm which integrates PCA/UMAP-based projection, Delaunay triangulation, and a novel back-projection mechanism to form clusters in the original high-dimensional space. DelTriC decouples neighborhood construction from decision-making by first triangulating in a low-dimensional proxy to index local adjacency, and then back-projecting to the original space to perform robust edge pruning, merging, and anomaly detection. DelTriC can outperform traditional methods such as k-means, DBSCAN, and HDBSCAN in many scenarios; it is both scalable and accurate, and it also significantly improves outlier detection.

DelTriC: A Novel Clustering Method with Accurate Outlier

TL;DR

DelTriC tackles clustering in high-dimensional data with irregular shapes and simultaneous outlier detection. It projects data to a 2D proxy, builds a Delaunay triangulation, and back-projects to the original space to prune edges and merge clusters, aiming to preserve high-dimensional geometry. Key contributions include a sigma-based pruning framework with MAD normalization, a conservative centroid-based merging step, and a back-projection mechanism that improves anomaly detection. Empirical results on synthetic and real-world datasets show competitive clustering performance with stronger anomaly detection and favorable scalability compared to DBSCAN/HDBSCAN.

Abstract

The paper introduces DelTriC (Delaunay Triangulation Clustering), a clustering algorithm which integrates PCA/UMAP-based projection, Delaunay triangulation, and a novel back-projection mechanism to form clusters in the original high-dimensional space. DelTriC decouples neighborhood construction from decision-making by first triangulating in a low-dimensional proxy to index local adjacency, and then back-projecting to the original space to perform robust edge pruning, merging, and anomaly detection. DelTriC can outperform traditional methods such as k-means, DBSCAN, and HDBSCAN in many scenarios; it is both scalable and accurate, and it also significantly improves outlier detection.

Paper Structure

This paper contains 35 sections, 14 equations, 8 figures, 13 tables, 1 algorithm.

Figures (8)

  • Figure 1: True Labels and all tested algorithms on benchmark $n_{\text{points}}=5000$ and $\text{dim}=50$
  • Figure 2: Illustrative examples: (a) anomaly detection with DelTriC, (b) clustering with triangulation.
  • Figure 3: A simple test to demonstrate that outliers can end up as internal points of a cluster after dimensionality reduction.
  • Figure 4: A simple test to demonstrate that outliers can end up as internal points of a cluster after dimensionality reduction.
  • Figure 5: A simple test to demonstrate pruning step degradation when manifolds get very close.
  • ...and 3 more figures