Hierarchical Correlation Clustering and Tree Preserving Embedding
Morteza Haghir Chehreghani, Mostafa Haghir Chehreghani
TL;DR
This work addresses clustering with signed pairwise dissimilarities by introducing Hierarchical Correlation Clustering (HCC), which yields multi-level clusters within an agglomerative framework. It then develops two representation-learning routes: a tree-preserving embedding of HCC dendrograms to produce vector features, and the use of minimax dissimilarities extended to correlation clustering to capture transitive, elongated structures with reduced computational complexity. Key contributions include the definitional groundwork for HCC, a level-based ultrametric distance enabling MDS-based embeddings, theoretical results on minimax-based clustering with shift invariance, and extensive experiments showing robustness to noise and superior downstream performance (e.g., with GMM) across UCI datasets, Fashion-MNIST, and other corpora. Overall, the approach provides practical tools for unsupervised learning with signed similarities, delivering both hierarchical clustering capabilities and meaningful feature representations that improve downstream clustering performance and interpretability.
Abstract
We propose a hierarchical correlation clustering method that extends the well-known correlation clustering to produce hierarchical clusters applicable to both positive and negative pairwise dissimilarities. Then, in the following, we study unsupervised representation learning with such hierarchical correlation clustering. For this purpose, we first investigate embedding the respective hierarchy to be used for tree preserving embedding and feature extraction. Thereafter, we study the extension of minimax distance measures to correlation clustering, as another representation learning paradigm. Finally, we demonstrate the performance of our methods on several datasets.
