Fixed-sized clusters $k$-Means
Mikko I. Malinen, Pasi Fränti
TL;DR
We address constrained clustering by enforcing fixed cluster sizes in a k-means framework. The method replaces nearest-centroid assignment with a linear assignment solved by the Hungarian algorithm over pre-allocated cluster slots, with edge weights $W(a,i) = \|X_i - C^t_{\arg\min_j c(j)\ge a}\|^2$ and centroids updated as means of assigned points. The assignment costs are $O(n^3)$ per iteration, enabling datasets up to roughly 5000 points, with convergence to a local optimum. A seating-plan application demonstrates practical utility by embedding a compatibility matrix via multidimensional scaling and allocating participants to tables of fixed sizes.
Abstract
We present a $k$-means-based clustering algorithm, which optimizes the mean square error, for given cluster sizes. A straightforward application is balanced clustering, where the sizes of each cluster are equal. In the $k$-means assignment phase, the algorithm solves an assignment problem using the Hungarian algorithm. This makes the assignment phase time complexity $O(n^3)$. This enables clustering of datasets of size more than 5000 points.
