Probabilistic Foundations of Fuzzy Simplicial Sets for Nonlinear Dimensionality Reduction
Janis Keck, Lukas Silvester Barth, Fatemeh, Fahimi, Parvaneh Joharinad, Jürgen Jost
TL;DR
This work provides a probabilistic foundation for fuzzy simplicial sets by showing they are marginals of distributions over standard simplicial sets, thereby linking fuzzy weights to generative models on simplicial complexes. It clarifies how UMAP can be viewed as marginalizing Vietoris–Rips filtrations and generalizes to Čech filtrations with triplet sampling (ČUMAP), while connecting divergences, t-norms, and merging operations within a unified framework. The approach yields new embedding methods and theoretical insights, including curvature-based and rank-order variants, and offers a pathway to systematic generalizations of nonlinear dimensionality reduction. Overall, the probabilistic perspective broadens the interpretability and extensibility of fuzzy simplicial-set-based manifolds learning and visualization methods.
Abstract
Fuzzy simplicial sets have become an object of interest in dimensionality reduction and manifold learning, most prominently through their role in UMAP. However, their definition through tools from algebraic topology without a clear probabilistic interpretation detaches them from commonly used theoretical frameworks in those areas. In this work we introduce a framework that explains fuzzy simplicial sets as marginals of probability measures on simplicial sets. In particular, this perspective shows that the fuzzy weights of UMAP arise from a generative model that samples Vietoris-Rips filtrations at random scales, yielding cumulative distribution functions of pairwise distances. More generally, the framework connects fuzzy simplicial sets to probabilistic models on the face poset, clarifies the relation between Kullback-Leibler divergence and fuzzy cross-entropy in this setting, and recovers standard t-norms and t-conorms via Boolean operations on the underlying simplicial sets. We then show how new embedding methods may be derived from this framework and illustrate this on an example where we generalize UMAP using Čech filtrations with triplet sampling. In summary, this probabilistic viewpoint provides a unified probabilistic theoretical foundation for fuzzy simplicial sets, clarifies the role of UMAP within this framework, and enables the systematic derivation of new dimensionality reduction methods.
