Table of Contents
Fetching ...

ConstellationNet: Reinventing Spatial Clustering through GNNs

Aidan Gao, Junhong Lin

TL;DR

ConstellationNet tackles spatial clustering in large-scale, high-dimensional data by fusing CNN-based embeddings with graph-based neighbor aggregation via a weighted $K$-NN graph and a differentiable clustering operator (DMoN). The architecture supports both supervised and unsupervised pipelines and uses minibatching for scalability, achieving state-of-the-art results on MNIST, CIFAR-10, and ImageNet while using far fewer parameters and training time than competing methods. It introduces a CNN embedding stage, a graph pooling/clustering objective, and a graph-augmented prediction mechanism, with ablations demonstrating the importance of embeddings (e.g., DINO, UMAP) and the DMoN losses. The work suggests strong practical impact for fast, scalable clustering in domains like epidemiology and medical imaging, and points to future work on unsupervised loss design for unbalanced data.

Abstract

Spatial clustering is a crucial field, finding universal use across criminology, pathology, and urban planning. However, most spatial clustering algorithms cannot pull information from nearby nodes and suffer performance drops when dealing with higher dimensionality and large datasets, making them suboptimal for large-scale and high-dimensional clustering. Due to modern data growing in size and dimension, clustering algorithms become weaker when addressing multifaceted issues. To improve upon this, we develop ConstellationNet, a convolution neural network(CNN)-graph neural network(GNN) framework that leverages the embedding power of a CNN, the neighbor aggregation of a GNN, and a neural network's ability to deal with batched data to improve spatial clustering and classification with graph augmented predictions. ConstellationNet achieves state-of-the-art performance on both supervised classification and unsupervised clustering across several datasets, outperforming state-of-the-art classification and clustering while reducing model size and training time by up to tenfold and improving baselines by 10 times. Because of its fast training and powerful nature, ConstellationNet holds promise in fields like epidemiology and medical imaging, able to quickly train on new data to develop robust responses.

ConstellationNet: Reinventing Spatial Clustering through GNNs

TL;DR

ConstellationNet tackles spatial clustering in large-scale, high-dimensional data by fusing CNN-based embeddings with graph-based neighbor aggregation via a weighted -NN graph and a differentiable clustering operator (DMoN). The architecture supports both supervised and unsupervised pipelines and uses minibatching for scalability, achieving state-of-the-art results on MNIST, CIFAR-10, and ImageNet while using far fewer parameters and training time than competing methods. It introduces a CNN embedding stage, a graph pooling/clustering objective, and a graph-augmented prediction mechanism, with ablations demonstrating the importance of embeddings (e.g., DINO, UMAP) and the DMoN losses. The work suggests strong practical impact for fast, scalable clustering in domains like epidemiology and medical imaging, and points to future work on unsupervised loss design for unbalanced data.

Abstract

Spatial clustering is a crucial field, finding universal use across criminology, pathology, and urban planning. However, most spatial clustering algorithms cannot pull information from nearby nodes and suffer performance drops when dealing with higher dimensionality and large datasets, making them suboptimal for large-scale and high-dimensional clustering. Due to modern data growing in size and dimension, clustering algorithms become weaker when addressing multifaceted issues. To improve upon this, we develop ConstellationNet, a convolution neural network(CNN)-graph neural network(GNN) framework that leverages the embedding power of a CNN, the neighbor aggregation of a GNN, and a neural network's ability to deal with batched data to improve spatial clustering and classification with graph augmented predictions. ConstellationNet achieves state-of-the-art performance on both supervised classification and unsupervised clustering across several datasets, outperforming state-of-the-art classification and clustering while reducing model size and training time by up to tenfold and improving baselines by 10 times. Because of its fast training and powerful nature, ConstellationNet holds promise in fields like epidemiology and medical imaging, able to quickly train on new data to develop robust responses.

Paper Structure

This paper contains 24 sections, 1 equation, 4 figures, 5 tables.

Figures (4)

  • Figure 1: A visualization of a GNN transform functionintroduction.
  • Figure 2: The dataset transformation pipeline visualized on six samples from the MNIST dataset, moving from images to nodes on a graphmnist.
  • Figure 3: The Supervised CNN-GNN pipeline, acting on the end product of dataset construction. $x_i$ Represents a node’s attribute, and $E_{ij}$ denotes an edge between node $x_i$ and $x_j$. $C$ represents the number of channels an image has, and $D_n$ represents the size of the image.
  • Figure 4: The Unsupervised CNN-GNN pipeline, acting on the end product of dataset construction. Data is passed through two transforms before the GNN, which are static in this case.