LSEC: Large-scale spectral ensemble clustering
Hongmin Li, Xiucai Ye, Akira Imakura, Tetsuya Sakurai
TL;DR
This paper tackles the efficiency challenges of large-scale ensemble clustering by introducing LSEC, which combines a divide-and-conquer large-scale spectral clustering approach for generating diverse base clusterings with a bipartite-graph based consensus function. It introduces two acceleration tricks—reusing $K$-nearest neighbors and light-$k$-means—to drastically reduce computation without sacrificing accuracy. The method yields a lower overall complexity than many existing ensemble approaches and demonstrates strong performance on ten large-scale datasets in terms of ACC and NMI, while also achieving faster runtimes. The work offers a practical, scalable framework for ensemble clustering capable of handling datasets with millions of points, with broad implications for applications requiring robust consensus clustering at scale.
Abstract
Ensemble clustering is a fundamental problem in the machine learning field, combining multiple base clusterings into a better clustering result. However, most of the existing methods are unsuitable for large-scale ensemble clustering tasks due to the efficiency bottleneck. In this paper, we propose a large-scale spectral ensemble clustering (LSEC) method to strike a good balance between efficiency and effectiveness. In LSEC, a large-scale spectral clustering based efficient ensemble generation framework is designed to generate various base clusterings within a low computational complexity. Then all based clustering are combined through a bipartite graph partition based consensus function into a better consensus clustering result. The LSEC method achieves a lower computational complexity than most existing ensemble clustering methods. Experiments conducted on ten large-scale datasets show the efficiency and effectiveness of the LSEC method. The MATLAB code of the proposed method and experimental datasets are available at https://github.com/Li- Hongmin/MyPaperWithCode.
