visClust: A visual clustering algorithm based on orthogonal projections
Anna Breger, Clemens Karner, Martin Ehler
TL;DR
visClust introduces a fast, parameter-light clustering method that leverages random orthogonal projections from the Grassmannian to produce low-dimensional representations, which are encoded as binary images and partitioned via simple image-processing steps. By iteratively sampling projections, filtering, thresholding, and analyzing connected components, it selects a partition that matches the target cluster count $n_c$ while requiring only one obligatory input parameter in the default setting. Across synthetic and publicly available datasets, visClust demonstrates strong ACC and ARI performance with favorable runtime and RAM consumption, often outperforming six well-known baselines; it remains robust under default settings and can benefit from parameter tuning or nonlinear projections for imaging data. The approach provides a practical, scalable clustering tool with publicly available code and clear potential for extensions to handle higher-dimensional or imaging-specific tasks through nonlinear embeddings like t-SNE.
Abstract
We present a novel clustering algorithm, visClust, that is based on lower dimensional data representations and visual interpretation. Thereto, we design a transformation that allows the data to be represented by a binary integer array enabling the use of image processing methods to select a partition. Qualitative and quantitative analyses measured in accuracy and an adjusted Rand-Index show that the algorithm performs well while requiring low runtime and RAM. We compare the results to 6 state-of-the-art algorithms with available code, confirming the quality of visClust by superior performance in most experiments. Moreover, the algorithm asks for just one obligatory input parameter while allowing optimization via optional parameters. The code is made available on GitHub and straightforward to use.
