BalLOT: Balanced $k$-means clustering with optimal transport
Wenyan Luo, Dustin G. Mixon
TL;DR
The paper tackles balanced $k$-means clustering by formulating the assignment step as a balanced optimal-transport problem (BalLOT), pairing OT with centroid updates to achieve scalable, balanced partitions. It introduces an entropically regularized variant (E-BalLOT) for speed, and proves that BalLOT yields integral couplings for generic data, with a benign population landscape that aligns local minima with planted clusters. Finite-sample guarantees include a basin-of-attraction analysis and one-step recovery under suitable initializations, plus probabilistic misclustering bounds under the stochastic ball model. Empirically, BalLOT and E-BalLOT show near-linear scaling and competitive exact recovery against SDP and matching-based methods, validating their practical utility for large-scale, balanced clustering tasks.
Abstract
We consider the fundamental problem of balanced $k$-means clustering. In particular, we introduce an optimal transport approach to alternating minimization called BalLOT, and we show that it delivers a fast and effective solution to this problem. We establish this with a variety of numerical experiments before proving several theoretical guarantees. First, we prove that for generic data, BalLOT produces integral couplings at each step. Next, we perform a landscape analysis to provide theoretical guarantees for both exact and partial recoveries of planted clusters under the stochastic ball model. Finally, we propose initialization schemes that achieve one-step recovery of planted clusters.
