A Greedy Strategy for Graph Cut
Feiping Nie, Shenfei Pei, Zengwei Zheng, Rong Wang, Xuelong Li
TL;DR
The paper introduces GGC, a greedy agglomerative approach to the normalized cut problem that starts with singleton clusters and iteratively merges pairs to minimize the objective $f(\mathbf{Y})=\mathrm{Tr}\left((\mathbf{Y}^T \mathbf{D} \mathbf{Y})^{-1} \mathbf{Y}^T \mathbf{L} \mathbf{Y}\right)$, proving monotonic decrease and yielding a unique solution. To achieve scalability, it exploits Neighbor Cluster Searching and a red-black tree to maintain a compact set of candidate merges, resulting in near-linear time complexity $O((n-c) k_1 \log(n k_1))$. GGC is demonstrated to produce superior or competitive clustering quality compared with a range of baselines (including traditional spectral clustering and GANC) across 16 mid-scale and 7 large-scale real-world datasets, while also offering substantial runtime advantages on large datasets. The method is hyperparameter-free and extensible to other graph-cut objectives, presenting a practical, scalable alternative for spectral clustering tasks in large-scale settings.
Abstract
We propose a Greedy strategy to solve the problem of Graph Cut, called GGC. It starts from the state where each data sample is regarded as a cluster and dynamically merges the two clusters which reduces the value of the global objective function the most until the required number of clusters is obtained, and the monotonicity of the sequence of objective function values is proved. To reduce the computational complexity of GGC, only mergers between clusters and their neighbors are considered. Therefore, GGC has a nearly linear computational complexity with respect to the number of samples. Also, unlike other algorithms, due to the greedy strategy, the solution of the proposed algorithm is unique. In other words, its performance is not affected by randomness. We apply the proposed method to solve the problem of normalized cut which is a widely concerned graph cut problem. Extensive experiments show that better solutions can often be achieved compared to the traditional two-stage optimization algorithm (eigendecomposition + k-means), on the normalized cut problem. In addition, the performance of GGC also has advantages compared to several state-of-the-art clustering algorithms.
