Hierarchical clustering with maximum density paths and mixture models
Martin Ritzert, Polina Turishcheva, Laura Hansel, Paul Wollenhaupt, Marissa A. Weis, Alexander S. Ecker
TL;DR
t-NEB introduces a probabilistically grounded hierarchical clustering framework for high-dimensional data by overclustering with a Student's $t$ mixture model to produce a density landscape, then using nudged elastic band (NEB) paths to define maximum-density connections between clusters. A bottom-up merging procedure constructs a hierarchy from the initial overclustered partition without re-estimating centers, yielding a dedrogram that reflects multi-scale structure. The approach achieves state-of-the-art or competitive performance on synthetic and real datasets (including MNIST-Nd embeddings and transcriptomic cell types) and provides interpretable hierarchies that reveal fine-grained patterns. By unifying density estimation and merging under a single probabilistic model and avoiding dimensionality reduction, t-NEB offers a robust, scalable tool for exploratory analysis of complex data with ambiguous cluster boundaries.
Abstract
Hierarchical clustering is an effective, interpretable method for analyzing structure in data. It reveals insights at multiple scales without requiring a predefined number of clusters and captures nested patterns and subtle relationships, which are often missed by flat clustering approaches. However, existing hierarchical clustering methods struggle with high-dimensional data, especially when there are no clear density gaps between modes. In this work, we introduce t-NEB, a probabilistically grounded hierarchical clustering method, which yields state-of-the-art clustering performance on naturalistic high-dimensional data. t-NEB consists of three steps: (1) density estimation via overclustering; (2) finding maximum density paths between clusters; (3) creating a hierarchical structure via bottom-up cluster merging. t-NEB uses a probabilistic parametric density model for both overclustering and cluster merging, which yields both high clustering performance and a meaningful hierarchy, making it a valuable tool for exploratory data analysis. Code is available at https://github.com/ecker-lab/tneb clustering.
