A cutting plane algorithm for globally solving low dimensional k-means clustering problems
Martin Ryner, Jan Kronqvist, Johan Karlsson
TL;DR
This work tackles the NP-hard problem of globally solving k-means for low-dimensional data by reformulating it as a concave assignment problem and solving it via a cutting-plane framework that alternates between a small concave subproblem and a large linear program. The authors prove that the upper-lower bound gap converges to zero, ensuring global optimality, and introduce acceleration techniques (symmetry breaking, branching, integer and tight constraints) to make the approach practical. They demonstrate substantial performance gains on synthetic data and show improved alignment with ground truth on MNIST when using the global solution versus local methods. The method provides explicit optimality gaps to terminate early and offers a principled framework for exact clustering in settings with small $k$ and $d$, with future work aimed at polytope refinements and parallelization for larger-scale problems.
Abstract
Clustering is one of the most fundamental tools in data science and machine learning, and k-means clustering is one of the most common such methods. There is a variety of approximate algorithms for the k-means problem, but computing the globally optimal solution is in general NP-hard. In this paper we consider the k-means problem for instances with low dimensional data and formulate it as a structured concave assignment problem. This allows us to exploit the low dimensional structure and solve the problem to global optimality within reasonable time for large data sets with several clusters. The method builds on iteratively solving a small concave problem and a large linear programming problem. This gives a sequence of feasible solutions along with bounds which we show converges to zero optimality gap. The paper combines methods from global optimization theory to accelerate the procedure, and we provide numerical results on their performance.
