Table of Contents
Fetching ...

A Pragmatic Method for Comparing Clusterings with Overlaps and Outliers

Ryan DeWolfe, Paweł Prałat, François Théberge

TL;DR

The paper introduces F^*_{wo}, a pragmatic similarity measure for comparing clusterings that may contain overlaps and outliers. Built on set-matching and the $F^*$ (Jaccard-like) cluster similarity, it matches clusters across clusterings, weights by cluster size, and symmetrizes the score, with an explicit outlier term to handle unclustered objects. It analyzes fundamental properties—normalization, label invariance, symmetry, and robustness to small changes—and discusses why a metric is not strictly necessary for practical clustering evaluation. Through intuitive experiments and graph-aware benchmarks on synthetic ABCD+o^2 data, the authors demonstrate that F^*_{wo} avoids common biases seen in Omega, oNMI, and ECS while remaining computationally efficient; they also provide a Python implementation and advocate evaluating both vertex and edge perspectives in graphs when appropriate.

Abstract

Clustering algorithms are an essential part of the unsupervised data science ecosystem, and extrinsic evaluation of clustering algorithms requires a method for comparing the detected clustering to a ground truth clustering. In a general setting, the detected and ground truth clusterings may have outliers (objects belonging to no cluster), overlapping clusters (objects may belong to more than one cluster), or both, but methods for comparing these clusterings are currently undeveloped. In this note, we define a pragmatic similarity measure for comparing clusterings with overlaps and outliers, show that it has several desirable properties, and experimentally confirm that it is not subject to several common biases afflicting other clustering comparison measures.

A Pragmatic Method for Comparing Clusterings with Overlaps and Outliers

TL;DR

The paper introduces F^*_{wo}, a pragmatic similarity measure for comparing clusterings that may contain overlaps and outliers. Built on set-matching and the (Jaccard-like) cluster similarity, it matches clusters across clusterings, weights by cluster size, and symmetrizes the score, with an explicit outlier term to handle unclustered objects. It analyzes fundamental properties—normalization, label invariance, symmetry, and robustness to small changes—and discusses why a metric is not strictly necessary for practical clustering evaluation. Through intuitive experiments and graph-aware benchmarks on synthetic ABCD+o^2 data, the authors demonstrate that F^*_{wo} avoids common biases seen in Omega, oNMI, and ECS while remaining computationally efficient; they also provide a Python implementation and advocate evaluating both vertex and edge perspectives in graphs when appropriate.

Abstract

Clustering algorithms are an essential part of the unsupervised data science ecosystem, and extrinsic evaluation of clustering algorithms requires a method for comparing the detected clustering to a ground truth clustering. In a general setting, the detected and ground truth clusterings may have outliers (objects belonging to no cluster), overlapping clusters (objects may belong to more than one cluster), or both, but methods for comparing these clusterings are currently undeveloped. In this note, we define a pragmatic similarity measure for comparing clusterings with overlaps and outliers, show that it has several desirable properties, and experimentally confirm that it is not subject to several common biases afflicting other clustering comparison measures.
Paper Structure (10 sections, 8 theorems, 36 equations, 3 figures)

This paper contains 10 sections, 8 theorems, 36 equations, 3 figures.

Key Result

Proposition 1

The proposed measure $F^*_{wo}$ has the following properties.

Figures (3)

  • Figure 1: Recreation of the experiment in Figure 2 of ecc with $F^*_{wo}$. Each column represents one scenario, including a visual of the scenario, the intuitive behaviour of a similarity measure in this scenario, and the actual behaviour of several measures in each row respectively. Each line is the average of $100$ simulations and the shaded region (usually too small to appear) covers plus or minus one standard deviation. In each scenario, the proposed $F^*_{wo}$ measure matches our intuition.
  • Figure 2: Comparing comparison measure behaviour on intuitive scenarios when overlaps or outliers are present. The top row is the scenario with overlapping clusters, and the second row has outliers; the columns, from left to right, show a visual of the scenario, the intuitive behaviour, and the actual behaviour of each method.
  • Figure 3: An experiment comparing graph aware clustering comparison, similar to that of Figure 5 from gam. We use three resolutions ($0.5$, $1.0$, and $3.75$ shown in red, green, and yellow respectively) of the Leiden clustering algorithm on ABCD graphs with 2000 vertices and show, the similarity of each detected vertex clustering to the ground truth vertex clusters, the similarity of the detected induced edge clustering to the ground truth induced edge clusters, and the number of clusters The solid line represents the average over 10 samples and the shaded region covers plus or minus one standard deviation.

Theorems & Definitions (15)

  • Proposition 1
  • Proposition 2
  • proof
  • Remark
  • Theorem 3
  • Lemma 4
  • Remark
  • Theorem : \ref{['thm:robust']}
  • Lemma 5
  • proof
  • ...and 5 more