Table of Contents
Fetching ...

Probabilistic Foundations of Fuzzy Simplicial Sets for Nonlinear Dimensionality Reduction

Janis Keck, Lukas Silvester Barth, Fatemeh, Fahimi, Parvaneh Joharinad, Jürgen Jost

TL;DR

This work provides a probabilistic foundation for fuzzy simplicial sets by showing they are marginals of distributions over standard simplicial sets, thereby linking fuzzy weights to generative models on simplicial complexes. It clarifies how UMAP can be viewed as marginalizing Vietoris–Rips filtrations and generalizes to Čech filtrations with triplet sampling (ČUMAP), while connecting divergences, t-norms, and merging operations within a unified framework. The approach yields new embedding methods and theoretical insights, including curvature-based and rank-order variants, and offers a pathway to systematic generalizations of nonlinear dimensionality reduction. Overall, the probabilistic perspective broadens the interpretability and extensibility of fuzzy simplicial-set-based manifolds learning and visualization methods.

Abstract

Fuzzy simplicial sets have become an object of interest in dimensionality reduction and manifold learning, most prominently through their role in UMAP. However, their definition through tools from algebraic topology without a clear probabilistic interpretation detaches them from commonly used theoretical frameworks in those areas. In this work we introduce a framework that explains fuzzy simplicial sets as marginals of probability measures on simplicial sets. In particular, this perspective shows that the fuzzy weights of UMAP arise from a generative model that samples Vietoris-Rips filtrations at random scales, yielding cumulative distribution functions of pairwise distances. More generally, the framework connects fuzzy simplicial sets to probabilistic models on the face poset, clarifies the relation between Kullback-Leibler divergence and fuzzy cross-entropy in this setting, and recovers standard t-norms and t-conorms via Boolean operations on the underlying simplicial sets. We then show how new embedding methods may be derived from this framework and illustrate this on an example where we generalize UMAP using Čech filtrations with triplet sampling. In summary, this probabilistic viewpoint provides a unified probabilistic theoretical foundation for fuzzy simplicial sets, clarifies the role of UMAP within this framework, and enables the systematic derivation of new dimensionality reduction methods.

Probabilistic Foundations of Fuzzy Simplicial Sets for Nonlinear Dimensionality Reduction

TL;DR

This work provides a probabilistic foundation for fuzzy simplicial sets by showing they are marginals of distributions over standard simplicial sets, thereby linking fuzzy weights to generative models on simplicial complexes. It clarifies how UMAP can be viewed as marginalizing Vietoris–Rips filtrations and generalizes to Čech filtrations with triplet sampling (ČUMAP), while connecting divergences, t-norms, and merging operations within a unified framework. The approach yields new embedding methods and theoretical insights, including curvature-based and rank-order variants, and offers a pathway to systematic generalizations of nonlinear dimensionality reduction. Overall, the probabilistic perspective broadens the interpretability and extensibility of fuzzy simplicial-set-based manifolds learning and visualization methods.

Abstract

Fuzzy simplicial sets have become an object of interest in dimensionality reduction and manifold learning, most prominently through their role in UMAP. However, their definition through tools from algebraic topology without a clear probabilistic interpretation detaches them from commonly used theoretical frameworks in those areas. In this work we introduce a framework that explains fuzzy simplicial sets as marginals of probability measures on simplicial sets. In particular, this perspective shows that the fuzzy weights of UMAP arise from a generative model that samples Vietoris-Rips filtrations at random scales, yielding cumulative distribution functions of pairwise distances. More generally, the framework connects fuzzy simplicial sets to probabilistic models on the face poset, clarifies the relation between Kullback-Leibler divergence and fuzzy cross-entropy in this setting, and recovers standard t-norms and t-conorms via Boolean operations on the underlying simplicial sets. We then show how new embedding methods may be derived from this framework and illustrate this on an example where we generalize UMAP using Čech filtrations with triplet sampling. In summary, this probabilistic viewpoint provides a unified probabilistic theoretical foundation for fuzzy simplicial sets, clarifies the role of UMAP within this framework, and enables the systematic derivation of new dimensionality reduction methods.

Paper Structure

This paper contains 30 sections, 32 theorems, 126 equations, 11 figures, 1 algorithm.

Key Result

Lemma 1

Let $[x_{i_0},...,x_{i_n}] \in X^{n+1}$ and $\sigma$ some arbitrary simplex. Let $\mu$ be the weight function of $\mathbf{S}([x_{i_0},...,x_{i_n}] )$. Then $\mu(\sigma) = 1$ if and only if there exist maps $f_1,f_2,...f_m$ such that where all $f_j$ are face or degeneracy maps.

Figures (11)

  • Figure 1: Example of a finite, fuzzy simplicial set/complex. Simplices with zero weight are not plotted, as are degenerate simplices.
  • Figure 2: Visualization of the Vietoris-Rips-Filtration. With growing scale $r$, all simplices are added where the diameter (maximum distance between any two vertices) is less or equal than $r$.
  • Figure 3: Illustration of the minimal simplicial set for a given simplex. All simplices that are plotted are assumed to have weight $1$, all that are not plotted have weight $0$. The minimal simplicial set simply contains the simplex and all of its faces and degeneracies (the latter are not plotted).
  • Figure 4: Illustration of the procedure for obtaining fuzzy weights from a probability distribution over a simplicial set. Top row shows a probability distribution over 4 simplicial sets (only the nondegenerate simplices with weight $1$ are shown). The bottom plot shows how the fuzzy weights for some of the simplices are obtained by computing the marginal probability of observing them in any of the simplicial sets.
  • Figure 5: Illustration of the probability distribution defined above. We sample radii according to a probability distribution $p(r)$. To determine the probability of a certain simplicial set $S$ like the one on the left, we first have to check whether it is a valid element of the VR filtration, else its probability is zero. Then, the probability is determined by integrating $p(r)$ from $d_M(S)$ to $d_m(S)$. $d_M(S)$ is the radius where the last simplex was added to $S$ (purple edge, alternatively the filled triangle which appears at the same time), and $d_m(S)$ is the lowest radius where a new simplex would be added (green edge).
  • ...and 6 more figures

Theorems & Definitions (102)

  • Definition 1
  • Definition 2
  • Example 1
  • Definition 3
  • Definition 4
  • Remark 1
  • Definition 5
  • Example 2
  • Definition 6
  • Definition 7
  • ...and 92 more