Table of Contents
Fetching ...

PretopoMD: Pretopology-based Mixed Data Hierarchical Clustering

Loup-Noe Levy, Guillaume Guerard, Sonia Djebali, Soufian Ben Amor

TL;DR

The paper addresses clustering of heterogeneous data without dimensionality reduction by introducing PretopoMD, a pretopology-based hierarchical clustering method. It leverages a Disjunctive Normal Form to encode customizable logical rules over prenetworks derived from each feature, enabling direct handling of mixed data types. The approach yields a dendrogram-based, explainable clustering structure and supports adjustable seeds and hyperparameters for tailored granularity. Empirical results on synthetic and real mixed datasets (e.g., Palmer Penguins, Sponge) show that PretopoMD can achieve interpretable clusters with competitive metrics in certain configurations, while incurring higher computational costs and requiring careful hyperparameter tuning. The work highlights the potential of direct mixed-data clustering and suggests future work on efficiency gains and broader domain applications.

Abstract

This article presents a novel pretopology-based algorithm designed to address the challenges of clustering mixed data without the need for dimensionality reduction. Leveraging Disjunctive Normal Form, our approach formulates customizable logical rules and adjustable hyperparameters that allow for user-defined hierarchical cluster construction and facilitate tailored solutions for heterogeneous datasets. Through hierarchical dendrogram analysis and comparative clustering metrics, our method demonstrates superior performance by accurately and interpretably delineating clusters directly from raw data, thus preserving data integrity. Empirical findings highlight the algorithm's robustness in constructing meaningful clusters and reveal its potential in overcoming issues related to clustered data explainability. The novelty of this work lies in its departure from traditional dimensionality reduction techniques and its innovative use of logical rules that enhance both cluster formation and clarity, thereby contributing a significant advancement to the discourse on clustering mixed data.

PretopoMD: Pretopology-based Mixed Data Hierarchical Clustering

TL;DR

The paper addresses clustering of heterogeneous data without dimensionality reduction by introducing PretopoMD, a pretopology-based hierarchical clustering method. It leverages a Disjunctive Normal Form to encode customizable logical rules over prenetworks derived from each feature, enabling direct handling of mixed data types. The approach yields a dendrogram-based, explainable clustering structure and supports adjustable seeds and hyperparameters for tailored granularity. Empirical results on synthetic and real mixed datasets (e.g., Palmer Penguins, Sponge) show that PretopoMD can achieve interpretable clusters with competitive metrics in certain configurations, while incurring higher computational costs and requiring careful hyperparameter tuning. The work highlights the potential of direct mixed-data clustering and suggests future work on efficiency gains and broader domain applications.

Abstract

This article presents a novel pretopology-based algorithm designed to address the challenges of clustering mixed data without the need for dimensionality reduction. Leveraging Disjunctive Normal Form, our approach formulates customizable logical rules and adjustable hyperparameters that allow for user-defined hierarchical cluster construction and facilitate tailored solutions for heterogeneous datasets. Through hierarchical dendrogram analysis and comparative clustering metrics, our method demonstrates superior performance by accurately and interpretably delineating clusters directly from raw data, thus preserving data integrity. Empirical findings highlight the algorithm's robustness in constructing meaningful clusters and reveal its potential in overcoming issues related to clustered data explainability. The novelty of this work lies in its departure from traditional dimensionality reduction techniques and its innovative use of logical rules that enhance both cluster formation and clarity, thereby contributing a significant advancement to the discourse on clustering mixed data.

Paper Structure

This paper contains 27 sections, 4 theorems, 5 equations, 10 figures, 7 tables, 6 algorithms.

Key Result

Proposition 1

Intersection of Closures. In a pretopological space of type $V$, the intersection of closures is a closure.

Figures (10)

  • Figure 1: Example of a pseudoclosure function.
  • Figure 2: Closure of set $A$, $a^4 (A)=F(A)$.
  • Figure 3: Pseudoclosure function on a graph.
  • Figure 4: Filters vs Prefilters.
  • Figure 5: Neighborhood definition of a pretopology.
  • ...and 5 more figures

Theorems & Definitions (12)

  • Definition 1
  • Definition 2
  • Definition 3
  • Definition 4
  • Definition 5
  • Definition 6
  • Proposition 1
  • Proposition 2
  • Proposition 3
  • Proposition 4
  • ...and 2 more