The Mass Agreement Score: A Point-centric Measure of Cluster Size Consistency

Randolph Wiredu-Aidoo

The Mass Agreement Score: A Point-centric Measure of Cluster Size Consistency

Randolph Wiredu-Aidoo

Abstract

In clustering, strong dominance in the size of a particular cluster is often undesirable, motivating a measure of cluster size uniformity that can be used to filter such partitions. A basic requirement of such a measure is stability: partitions that differ only slightly in their point assignments should receive similar uniformity scores. A difficulty arises because cluster labels are not fixed objects; algorithms may produce different numbers of labels even when the underlying point distribution changes very little. Measures defined directly over labels can therefore become unstable under label-count perturbations. I introduce the Mass Agreement Score (MAS), a point-centric metric bounded in [0, 1] that evaluates the consistency of expected cluster size as measured from the perspective of points in each cluster. Its construction yields fragment robustness by design, assigning similar scores to partitions with similar bulk structure while remaining sensitive to genuine redistribution of cluster mass.

The Mass Agreement Score: A Point-centric Measure of Cluster Size Consistency

Abstract

Paper Structure (48 sections, 5 theorems, 86 equations, 1 figure, 4 tables)

This paper contains 48 sections, 5 theorems, 86 equations, 1 figure, 4 tables.

Introduction
Related Work
Alternative Measures
Entropy-based measures.
Concentration indices and the HHI.
Inequality measures.
Cluster validity indices.
Size-biased sampling.
Positioning MAS
Proposed Method
Setup
Point-Centric Baseline
Leave-One-Out Baseline
Definition
Range and boundary case
...and 33 more sections

Key Result

Proposition A.1

$U = 1$ if and only if all clusters have equal size.

Figures (1)

Figure 1: MAS and entropy ($\mathrm{H}$) values across four partitions of $N = 10{,}020$. MAS remains near $1.00$ across partitions with the same bulk structure, significantly reducing only when a large cluster ($\approx 0.5 N$) is split in half. Normalized entropy instead falls from $1.00$ to $0.51$ across similar partitions and rises to its second-highest value when the large cluster is split, demonstrating a label-centric focus. Unnormalized entropy primarily increases with label count, reflecting the larger entropy scale induced by an increase in label count.

Theorems & Definitions (11)

Definition 3.1: Mass Agreement Score
Proposition A.1
proof
Proposition A.2
proof
Proposition A.3
proof
Proposition A.4
proof
Proposition A.5
...and 1 more

The Mass Agreement Score: A Point-centric Measure of Cluster Size Consistency

Abstract

The Mass Agreement Score: A Point-centric Measure of Cluster Size Consistency

Authors

Abstract

Table of Contents

Key Result

Figures (1)

Theorems & Definitions (11)