A Global Optimization Algorithm for K-Center Clustering of One Billion Samples
Jiayang Ren, Ningning You, Kaixun Hua, Chaojie Ji, Yankai Cao
TL;DR
This work addresses the large-scale $K$-center clustering problem by introducing a tailored reduced-space branch-and-bound algorithm that guarantees finite-step convergence by branching only on the region of cluster centers. It features a two-stage decomposable lower bound with a closed-form solution, and accelerates pruning via bounds tightening, sample reduction, and parallelization, all implemented in Julia. Empirical results show the method solves datasets from $10^7$ to $10^9$ samples within 4 hours and achieves an average $25.8\%$ improvement in the objective over state-of-the-art heuristics. The approach enables globally optimal clustering at unprecedented scales, with open-source code and potential extensions to constrained variants.
Abstract
This paper presents a practical global optimization algorithm for the K-center clustering problem, which aims to select K samples as the cluster centers to minimize the maximum within-cluster distance. This algorithm is based on a reduced-space branch and bound scheme and guarantees convergence to the global optimum in a finite number of steps by only branching on the regions of centers. To improve efficiency, we have designed a two-stage decomposable lower bound, the solution of which can be derived in a closed form. In addition, we also propose several acceleration techniques to narrow down the region of centers, including bounds tightening, sample reduction, and parallelization. Extensive studies on synthetic and real-world datasets have demonstrated that our algorithm can solve the K-center problems to global optimal within 4 hours for ten million samples in the serial mode and one billion samples in the parallel mode. Moreover, compared with the state-of-the-art heuristic methods, the global optimum obtained by our algorithm can averagely reduce the objective function by 25.8% on all the synthetic and real-world datasets.
