Memetic Differential Evolution Methods for Semi-Supervised Clustering
Pierluigi Mansueto, Fabio Schoen
TL;DR
The paper tackles semi-supervised Minimum Sum-of-Squares Clustering with must-link and cannot-link constraints, an NP-hard problem. It introduces S-MDEClust, a memetic differential evolution framework that guarantees feasibility through exact and greedy assignment, a constraint-aware mutation, and a semi-supervised local-search step. Extensive experiments compare S-MDEClust variants against COP-K-MEAN, Baumann's BLP-KM, and global state-of-the-art methods PC-SOS-SDP and S-HG-MEANS, showing favorable performance in both feasibility and efficiency. The work establishes the first feasible memetic approach for semi-supervised MSSC and provides a foundation for future constraint-handling enhancements in clustering.
Abstract
In this paper, we propose an extension for semi-supervised Minimum Sum-of-Squares Clustering (MSSC) problems of MDEClust, a memetic framework based on the Differential Evolution paradigm for unsupervised clustering. In semi-supervised MSSC, background knowledge is available in the form of (instance-level) "must-link" and "cannot-link" constraints, each of which indicating if two dataset points should be associated to the same or to a different cluster, respectively. The presence of such constraints makes the problem at least as hard as its unsupervised version and, as a consequence, some framework operations need to be carefully designed to handle this additional complexity: for instance, it is no more true that each point is associated to its nearest cluster center. As far as we know, our new framework, called S-MDEClust, represents the first memetic methodology designed to generate a (hopefully) optimal feasible solution for semi-supervised MSSC problems. Results of thorough computational experiments on a set of well-known as well as synthetic datasets show the effectiveness and efficiency of our proposal.
