G-Mapper: Learning a Cover in the Mapper Construction
Enrique Alvarado, Robin Belton, Emily Fischer, Kang-Ju Lee, Sourabh Palande, Sarah Percival, Emilie Purvine
TL;DR
G-Mapper addresses the challenge of tuning the Mapper cover by adaptively splitting cover elements using a Gaussian mixture model guided by the Anderson–Darling normality test, avoiding the need for an initialized open cover. The method leverages $A_*^2$ to decide splits and uses $m_1,m_2$ and $\sigma_1,\sigma_2$ with overlap $g\_overlap$ to form two overlapping intervals, producing data-faithful Mapper graphs with improved runtimes. Empirical results on synthetic and real-world datasets show that G-Mapper captures essential data structure and often outperforms Multipass BIC, F-Mapper, and balanced covers in both quality metrics (e.g., Silhouette) and speed, while remaining applicable to high-dimensional settings. The work provides open-source code and demonstrates that the produced interval counts can serve as informative inputs to other Mapper variants, enhancing practical usability for topological data analysis.
Abstract
The Mapper algorithm is a visualization technique in topological data analysis (TDA) that outputs a graph reflecting the structure of a given dataset. However, the Mapper algorithm requires tuning several parameters in order to generate a ``nice" Mapper graph. This paper focuses on selecting the cover parameter. We present an algorithm that optimizes the cover of a Mapper graph by splitting a cover repeatedly according to a statistical test for normality. Our algorithm is based on G-means clustering which searches for the optimal number of clusters in $k$-means by iteratively applying the Anderson-Darling test. Our splitting procedure employs a Gaussian mixture model to carefully choose the cover according to the distribution of the given data. Experiments for synthetic and real-world datasets demonstrate that our algorithm generates covers so that the Mapper graphs retain the essence of the datasets, while also running significantly faster than a previous iterative method.
