ConstellationNet: Reinventing Spatial Clustering through GNNs
Aidan Gao, Junhong Lin
TL;DR
ConstellationNet tackles spatial clustering in large-scale, high-dimensional data by fusing CNN-based embeddings with graph-based neighbor aggregation via a weighted $K$-NN graph and a differentiable clustering operator (DMoN). The architecture supports both supervised and unsupervised pipelines and uses minibatching for scalability, achieving state-of-the-art results on MNIST, CIFAR-10, and ImageNet while using far fewer parameters and training time than competing methods. It introduces a CNN embedding stage, a graph pooling/clustering objective, and a graph-augmented prediction mechanism, with ablations demonstrating the importance of embeddings (e.g., DINO, UMAP) and the DMoN losses. The work suggests strong practical impact for fast, scalable clustering in domains like epidemiology and medical imaging, and points to future work on unsupervised loss design for unbalanced data.
Abstract
Spatial clustering is a crucial field, finding universal use across criminology, pathology, and urban planning. However, most spatial clustering algorithms cannot pull information from nearby nodes and suffer performance drops when dealing with higher dimensionality and large datasets, making them suboptimal for large-scale and high-dimensional clustering. Due to modern data growing in size and dimension, clustering algorithms become weaker when addressing multifaceted issues. To improve upon this, we develop ConstellationNet, a convolution neural network(CNN)-graph neural network(GNN) framework that leverages the embedding power of a CNN, the neighbor aggregation of a GNN, and a neural network's ability to deal with batched data to improve spatial clustering and classification with graph augmented predictions. ConstellationNet achieves state-of-the-art performance on both supervised classification and unsupervised clustering across several datasets, outperforming state-of-the-art classification and clustering while reducing model size and training time by up to tenfold and improving baselines by 10 times. Because of its fast training and powerful nature, ConstellationNet holds promise in fields like epidemiology and medical imaging, able to quickly train on new data to develop robust responses.
