Table of Contents
Fetching ...

Fixed-sized clusters $k$-Means

Mikko I. Malinen, Pasi Fränti

TL;DR

We address constrained clustering by enforcing fixed cluster sizes in a k-means framework. The method replaces nearest-centroid assignment with a linear assignment solved by the Hungarian algorithm over pre-allocated cluster slots, with edge weights $W(a,i) = \|X_i - C^t_{\arg\min_j c(j)\ge a}\|^2$ and centroids updated as means of assigned points. The assignment costs are $O(n^3)$ per iteration, enabling datasets up to roughly 5000 points, with convergence to a local optimum. A seating-plan application demonstrates practical utility by embedding a compatibility matrix via multidimensional scaling and allocating participants to tables of fixed sizes.

Abstract

We present a $k$-means-based clustering algorithm, which optimizes the mean square error, for given cluster sizes. A straightforward application is balanced clustering, where the sizes of each cluster are equal. In the $k$-means assignment phase, the algorithm solves an assignment problem using the Hungarian algorithm. This makes the assignment phase time complexity $O(n^3)$. This enables clustering of datasets of size more than 5000 points.

Fixed-sized clusters $k$-Means

TL;DR

We address constrained clustering by enforcing fixed cluster sizes in a k-means framework. The method replaces nearest-centroid assignment with a linear assignment solved by the Hungarian algorithm over pre-allocated cluster slots, with edge weights and centroids updated as means of assigned points. The assignment costs are per iteration, enabling datasets up to roughly 5000 points, with convergence to a local optimum. A seating-plan application demonstrates practical utility by embedding a compatibility matrix via multidimensional scaling and allocating participants to tables of fixed sizes.

Abstract

We present a -means-based clustering algorithm, which optimizes the mean square error, for given cluster sizes. A straightforward application is balanced clustering, where the sizes of each cluster are equal. In the -means assignment phase, the algorithm solves an assignment problem using the Hungarian algorithm. This makes the assignment phase time complexity . This enables clustering of datasets of size more than 5000 points.

Paper Structure

This paper contains 6 sections, 11 equations, 2 figures, 1 algorithm.

Figures (2)

  • Figure 1: Assigning points to centroids via cluster slots.
  • Figure 2: Minimum MSE calculation with fixed-sized clusters. Modeling with bipartite graph.