A Mapper Algorithm with implicit intervals and its optimization
Yuyang Tao, Shufei Ge
TL;DR
This work addresses limitations of the Mapper algorithm related to fixed interval covers and parameter tuning by introducing a probabilistic Soft Mapper with implicit intervals defined by a hidden assignment matrix. It uses a Gaussian Mixture Model to derive a row-wise assignment probability $Q$ and samples a Mapper graph via a multinomial scheme, while also defining a Mapper graph mode for a robust point estimate. A persistence-informed topological loss is combined with negative log-likelihood and optimized with stochastic gradient descent, yielding graphs that better capture underlying topology in noisy data. The method demonstrates competitive or improved topological fidelity on synthetic datasets and identifies a distinct Alzheimer's-related subgroup in an MSBB RNA-expression dataset, highlighting practical utility in biomedical topology analysis.
Abstract
The Mapper algorithm is an essential tool for visualizing complex, high dimensional data in topology data analysis (TDA) and has been widely used in biomedical research. It outputs a combinatorial graph whose structure implies the shape of the data. However,the need for manual parameter tuning and fixed intervals, along with fixed overlapping ratios may impede the performance of the standard Mapper algorithm. Variants of the standard Mapper algorithms have been developed to address these limitations, yet most of them still require manual tuning of parameters. Additionally, many of these variants, including the standard version found in the literature, were built within a deterministic framework and overlooked the uncertainty inherent in the data. To relax these limitations, in this work, we introduce a novel framework that implicitly represents intervals through a hidden assignment matrix, enabling automatic parameter optimization via stochastic gradient descent. In this work, we develop a soft Mapper framework based on a Gaussian mixture model(GMM) for flexible and implicit interval construction. We further illustrate the robustness of the soft Mapper algorithm by introducing the Mapper graph mode as a point estimation for the output graph. Moreover, a stochastic gradient descent algorithm with a specific topological loss function is proposed for optimizing parameters in the model. Both simulation and application studies demonstrate its effectiveness in capturing the underlying topological structures. In addition, the application to an RNA expression dataset obtained from the Mount Sinai/JJ Peters VA Medical Center Brain Bank (MSBB) successfully identifies a distinct subgroup of Alzheimer's Disease.
