Table of Contents
Fetching ...

The Mass Agreement Score: A Point-centric Measure of Cluster Size Consistency

Randolph Wiredu-Aidoo

Abstract

In clustering, strong dominance in the size of a particular cluster is often undesirable, motivating a measure of cluster size uniformity that can be used to filter such partitions. A basic requirement of such a measure is stability: partitions that differ only slightly in their point assignments should receive similar uniformity scores. A difficulty arises because cluster labels are not fixed objects; algorithms may produce different numbers of labels even when the underlying point distribution changes very little. Measures defined directly over labels can therefore become unstable under label-count perturbations. I introduce the Mass Agreement Score (MAS), a point-centric metric bounded in [0, 1] that evaluates the consistency of expected cluster size as measured from the perspective of points in each cluster. Its construction yields fragment robustness by design, assigning similar scores to partitions with similar bulk structure while remaining sensitive to genuine redistribution of cluster mass.

The Mass Agreement Score: A Point-centric Measure of Cluster Size Consistency

Abstract

In clustering, strong dominance in the size of a particular cluster is often undesirable, motivating a measure of cluster size uniformity that can be used to filter such partitions. A basic requirement of such a measure is stability: partitions that differ only slightly in their point assignments should receive similar uniformity scores. A difficulty arises because cluster labels are not fixed objects; algorithms may produce different numbers of labels even when the underlying point distribution changes very little. Measures defined directly over labels can therefore become unstable under label-count perturbations. I introduce the Mass Agreement Score (MAS), a point-centric metric bounded in [0, 1] that evaluates the consistency of expected cluster size as measured from the perspective of points in each cluster. Its construction yields fragment robustness by design, assigning similar scores to partitions with similar bulk structure while remaining sensitive to genuine redistribution of cluster mass.
Paper Structure (48 sections, 5 theorems, 86 equations, 1 figure, 4 tables)

This paper contains 48 sections, 5 theorems, 86 equations, 1 figure, 4 tables.

Key Result

Proposition A.1

$U = 1$ if and only if all clusters have equal size.

Figures (1)

  • Figure 1: MAS and entropy ($\mathrm{H}$) values across four partitions of $N = 10{,}020$. MAS remains near $1.00$ across partitions with the same bulk structure, significantly reducing only when a large cluster ($\approx 0.5 N$) is split in half. Normalized entropy instead falls from $1.00$ to $0.51$ across similar partitions and rises to its second-highest value when the large cluster is split, demonstrating a label-centric focus. Unnormalized entropy primarily increases with label count, reflecting the larger entropy scale induced by an increase in label count.

Theorems & Definitions (11)

  • Definition 3.1: Mass Agreement Score
  • Proposition A.1
  • proof
  • Proposition A.2
  • proof
  • Proposition A.3
  • proof
  • Proposition A.4
  • proof
  • Proposition A.5
  • ...and 1 more