EKM: An exact, polynomial-time algorithm for the $K$-medoids problem
Xi He, Max A. Little
TL;DR
This work addresses the exact K-medoids clustering problem by introducing EKM, an algorithm with worst-case time $O(N^{K+1})$ derived using the Bird-Meertens formalism and shortcut fusion to guarantee global optimality. EKM replaces traditional exponential-time approaches with a provably correct, recursive combinatorial generator that fuses evaluation and selection into the recursion, enabling polynomial-time worst-case analysis and parallelizable implementation. Empirical results on real-world and synthetic data demonstrate that EKM achieves the exact optimum and scales to datasets up to $N=5000$, often outperforming MIP-based solvers in wall-clock time. The work also discusses memory considerations, potential constraints handling via semiring lifting, and outlines directions for further parallel optimization and practical enhancements.
Abstract
The $K$-medoids problem is a challenging combinatorial clustering task, widely used in data analysis applications. While numerous algorithms have been proposed to solve this problem, none of these are able to obtain an exact (globally optimal) solution for the problem in polynomial time. In this paper, we present EKM: a novel algorithm for solving this problem exactly with worst-case $O\left(N^{K+1}\right)$ time complexity. EKM is developed according to recent advances in transformational programming and combinatorial generation, using formal program derivation steps. The derived algorithm is provably correct by construction. We demonstrate the effectiveness of our algorithm by comparing it against various approximate methods on numerous real-world datasets. We show that the wall-clock run time of our algorithm matches the worst-case time complexity analysis on synthetic datasets, clearly outperforming the exponential time complexity of benchmark branch-and-bound based MIP solvers. To our knowledge, this is the first, rigorously-proven polynomial time, practical algorithm for this ubiquitous problem.
