Neural Normalized Cut: A Differential and Generalizable Approach for Spectral Clustering
Wei He, Shangzhi Zhang, Chun-Guang Li, Xianbiao Qi, Rong Xiao, Jun Guo
TL;DR
Spectral clustering often struggles with generalization to unseen data and scalability due to eigen-decomposition. NeuNcut reparameterizes the segmentation matrix via a neural network with a softmax head and optimizes a relaxed normalized cut objective that combines a graph-Laplacian term and an orthogonality penalty, solved in an EM-like fashion to enable end-to-end learning. This enables direct inference of cluster memberships for new data and mini-batch training, dramatically improving scalability while achieving superior clustering accuracy on large-scale and imbalanced datasets compared to traditional Ncut and embedding-based methods. The approach shows strong generalization, competitive runtimes, and flexible extensions to other spectral objectives, making it a practical differential alternative to classical spectral clustering for ultra-large clustering tasks.
Abstract
Spectral clustering, as a popular tool for data clustering, requires an eigen-decomposition step on a given affinity to obtain the spectral embedding. Nevertheless, such a step suffers from the lack of generalizability and scalability. Moreover, the obtained spectral embeddings can hardly provide a good approximation to the ground-truth partition and thus a k-means step is adopted to quantize the embedding. In this paper, we propose a simple yet effective scalable and generalizable approach, called Neural Normalized Cut (NeuNcut), to learn the clustering membership for spectral clustering directly. In NeuNcut, we properly reparameterize the unknown cluster membership via a neural network, and train the neural network via stochastic gradient descent with a properly relaxed normalized cut loss. As a result, our NeuNcut enjoys a desired generalization ability to directly infer clustering membership for out-of-sample unseen data and hence brings us an efficient way to handle clustering task with ultra large-scale data. We conduct extensive experiments on both synthetic data and benchmark datasets and experimental results validate the effectiveness and the superiority of our approach. Our code is available at: https://github.com/hewei98/NeuNcut.
