Table of Contents
Fetching ...

More Clustering Quality Metrics for ABCDE

Stephan van Staden

TL;DR

This work extends ABCDE by (i) providing an approximate method to estimate the change in recall, (ii) introducing the IQ metric that quantifies how much a clustering diff improves quality relative to its size, and (iii) outlining procedures to assess absolute clustering quality (absolute $\mathrm{Precision}$ and $\mathrm{Recall}$) for a single snapshot. It develops exact per-item reasoning for $\Delta\mathrm{Recall}$ and two practical variants to recover key weights, then aggregates to the population via the set of affected items. The IQ framework links the magnitude of diffs to quality outcomes through a geometric interpretation and approximate calculations of distance to the ideal clustering. Finally, it proposes methods to derive absolute quality bounds by comparing a clustering snapshot against reference clusterings, enabling headroom estimation and more informed prioritization of clustering improvements.

Abstract

ABCDE is a technique for evaluating clusterings of very large populations of items. Given two clusterings, namely a Baseline clustering and an Experiment clustering, ABCDE can characterize their differences with impact and quality metrics, and thus help to determine which clustering to prefer. We previously described the basic quality metrics of ABCDE, namely the GoodSplitRate, BadSplitRate, GoodMergeRate, BadMergeRate and DeltaPrecision, and how to estimate them on the basis of human judgements. This paper extends that treatment with more quality metrics. It describes a technique that aims to characterize the DeltaRecall of the clustering change. It introduces a new metric, called IQ, to characterize the degree to which the clustering diff translates into an improvement in the quality. Ideally, a large diff would improve the quality by a large amount. Finally, this paper mentions ways to characterize the absolute Precision and Recall of a single clustering with ABCDE.

More Clustering Quality Metrics for ABCDE

TL;DR

This work extends ABCDE by (i) providing an approximate method to estimate the change in recall, (ii) introducing the IQ metric that quantifies how much a clustering diff improves quality relative to its size, and (iii) outlining procedures to assess absolute clustering quality (absolute and ) for a single snapshot. It develops exact per-item reasoning for and two practical variants to recover key weights, then aggregates to the population via the set of affected items. The IQ framework links the magnitude of diffs to quality outcomes through a geometric interpretation and approximate calculations of distance to the ideal clustering. Finally, it proposes methods to derive absolute quality bounds by comparing a clustering snapshot against reference clusterings, enabling headroom estimation and more informed prioritization of clustering improvements.

Abstract

ABCDE is a technique for evaluating clusterings of very large populations of items. Given two clusterings, namely a Baseline clustering and an Experiment clustering, ABCDE can characterize their differences with impact and quality metrics, and thus help to determine which clustering to prefer. We previously described the basic quality metrics of ABCDE, namely the GoodSplitRate, BadSplitRate, GoodMergeRate, BadMergeRate and DeltaPrecision, and how to estimate them on the basis of human judgements. This paper extends that treatment with more quality metrics. It describes a technique that aims to characterize the DeltaRecall of the clustering change. It introduces a new metric, called IQ, to characterize the degree to which the clustering diff translates into an improvement in the quality. Ideally, a large diff would improve the quality by a large amount. Finally, this paper mentions ways to characterize the absolute Precision and Recall of a single clustering with ABCDE.
Paper Structure (14 sections, 62 equations, 3 figures)

This paper contains 14 sections, 62 equations, 3 figures.

Figures (3)

  • Figure 1: The clustering quality situation from the perspective of item $i$. The item $i$ is always in the intersection of $\mathit{Base}(i)$ and $\mathit{Exp}(i)$ and $\mathit{Ideal}(i)$, which is never empty. $\mathit{Ideal}(i)$ is the set of all items that are truly equivalent to $i$. Each area inside the Venn diagram is labeled with its weight. For example, $\mathrm{GoodSplitWeight} = \mathit{weight}((\mathit{Base}(i) \setminus \mathit{Exp}(i)) \setminus \mathit{Ideal}(i))$ is the weight of all items that were correctly split off from the perspective of $i$. Note that the items in $(\mathit{Base}(i) \setminus \mathit{Exp}(i)) \setminus \mathit{Ideal}(i)$ are not equivalent to $i$, so removing them from the cluster of $i$ is good.
  • Figure 2: Several $\mathit{Exp}$ clusterings that have only quality improvements, i.e. only good splits and good merges and no bad splits or bad merges, and their associated $\mathit{IQ}$ values. All items are assumed to have equal weights.
  • Figure 3: A graphical representation of the $\mathit{IQ}$ metric in 2D space. The ideal clustering is the dot at the center, while $\mathit{Base}$ is some distance above it at the point where the curves meet. Each value of $\mathit{IQ}$ specifies a curve on which $\mathit{Exp}$ is located somewhere. Quality-positive changes have $\mathit{IQ} > 0$ and are colored green. Quality-neutral changes have $\mathit{IQ} = 0$ and are colored grey. Quality-negative changes have $\mathit{IQ} < 0$ and are colored red.

Theorems & Definitions (2)

  • proof
  • proof