Decomposing the Jaccard Distance and the Jaccard Index in ABCDE
Stephan van Staden
TL;DR
The paper develops a decomposed framework for the Jaccard-based metrics used in ABCDE, splitting the magnitude of clustering differences into Split and Merge components and further into Good/Bad contributions, while also decomposing the JaccardIndex into affected and unaffected parts with analogous quality splits. It provides exact population-level definitions and practical, unbiased estimation procedures via weighted-sample pairwise judgments, including strategies for sampling and confidence intervals. The approach yields a rich set of metrics (including DeltaPrecision as a bonus) that are interrelated through simple equations, enabling deeper debugging and interactive exploration of clustering changes. The work positions itself as complementary to ABCDE, offering alternative perspectives and additional tooling for understanding the nature and quality of clustering changes, with guidance on stratified sampling and implementation considerations. The practical impact lies in enabling more nuanced, human-judged evaluation of large-scale clustering changes and providing a structured pathway to debugging diffs via item-pair and cluster-level analyses.
Abstract
ABCDE is a sophisticated technique for evaluating differences between very large clusterings. Its main metric that characterizes the magnitude of the difference between two clusterings is the JaccardDistance, which is a true distance metric in the space of all clusterings of a fixed set of (weighted) items. The JaccardIndex is the complementary metric that characterizes the similarity of two clusterings. Its relationship with the JaccardDistance is simple: JaccardDistance + JaccardIndex = 1. This paper decomposes the JaccardDistance and the JaccardIndex further. In each case, the decomposition yields Impact and Quality metrics. The Impact metrics measure aspects of the magnitude of the clustering diff, while Quality metrics use human judgements to measure how much the clustering diff improves the quality of the clustering. The decompositions of this paper offer more and deeper insight into a clustering change. They also unlock new techniques for debugging and exploring the nature of the clustering diff. The new metrics are mathematically well-behaved and they are interrelated via simple equations. While the work can be seen as an alternative formal framework for ABCDE, we prefer to view it as complementary. It certainly offers a different perspective on the magnitude and the quality of a clustering change, and users can use whatever they want from each approach to gain more insight into a change.
