Table of Contents
Fetching ...

On the cosine similarity and orthogonality between persistence diagrams

Azmeer Nordin, Mohd Salmi Md Noorani, Nurulkamal Masseran, Mohd Sabri Ismail, Nur Firyal Roslan

TL;DR

This work addresses the limitation of standard similarity measures for persistence diagrams by proposing cosine similarity on persistence landscapes, with orthogonality capturing perfect dissimilarity. It defines and analyzes the cosine indicator, establishes its basic properties and stability, and demonstrates its empirical advantage over bottleneck and Wasserstein distances in synthetic data. The approach provides a robust, alignment-based notion of similarity for topological summaries, with potential for integration alongside existing vectorizations. Overall, the cosine similarity offers a practical tool for more accurate comparative analyses in topological data analysis.

Abstract

Topological data analysis is an approach to study shape of a data set by means of topology. Its main object of study is the persistence diagram, which represents the topological features of the data set at different spatial resolutions. Multiple data sets can be compared by the similarity of their diagrams to understand their behaviors in relative to each other. The bottleneck and Wasserstein distances are often used as a tool to indicate the similarity. In this paper, we introduce cosine similarity as a new indicator for the similarity between persistence diagrams and investigate its properties. Furthermore, it leads to the new notion of orthogonality between persistence diagrams. It turns out that the orthogonality refers to perfect dissimilarity between persistence diagrams under the cosine similarity. Through data demonstration, the cosine similarity is shown to be more accurate than the standard distances to measure the similarity between persistence diagrams.

On the cosine similarity and orthogonality between persistence diagrams

TL;DR

This work addresses the limitation of standard similarity measures for persistence diagrams by proposing cosine similarity on persistence landscapes, with orthogonality capturing perfect dissimilarity. It defines and analyzes the cosine indicator, establishes its basic properties and stability, and demonstrates its empirical advantage over bottleneck and Wasserstein distances in synthetic data. The approach provides a robust, alignment-based notion of similarity for topological summaries, with potential for integration alongside existing vectorizations. Overall, the cosine similarity offers a practical tool for more accurate comparative analyses in topological data analysis.

Abstract

Topological data analysis is an approach to study shape of a data set by means of topology. Its main object of study is the persistence diagram, which represents the topological features of the data set at different spatial resolutions. Multiple data sets can be compared by the similarity of their diagrams to understand their behaviors in relative to each other. The bottleneck and Wasserstein distances are often used as a tool to indicate the similarity. In this paper, we introduce cosine similarity as a new indicator for the similarity between persistence diagrams and investigate its properties. Furthermore, it leads to the new notion of orthogonality between persistence diagrams. It turns out that the orthogonality refers to perfect dissimilarity between persistence diagrams under the cosine similarity. Through data demonstration, the cosine similarity is shown to be more accurate than the standard distances to measure the similarity between persistence diagrams.

Paper Structure

This paper contains 12 sections, 6 theorems, 58 equations, 4 figures, 6 tables.

Key Result

Proposition 1

Let $D_1$ and $D_2$ be non-empty persistence diagrams. Fix a real $\eta>0$ such that Then there are positive constants $c_1$ and $c_2$, which depend on $D_1$ and $D_2$, such that for any pair of non-empty persistence diagrams $\tilde{D}_1$ and $\tilde{D}_2$ with $W_2(D_1, \tilde{D}_1) \leq \eta$ and $W_2(D_2, \tilde{D}_2) \leq \eta$.

Figures (4)

  • Figure 1: Persistence diagram and landscape for a sample from unit disc in $\mathbb{R}^2$
  • Figure 2: Persistence diagrams for data sets $Q$ and $Q'$
  • Figure 3: Persistence diagrams for data sets $R$ and $R'$
  • Figure 4: Persistence diagrams for data sets $S$ and $S'$

Theorems & Definitions (18)

  • Example 1
  • Example 2
  • Definition 1
  • Proposition 1
  • proof
  • Corollary 2
  • proof
  • Definition 2
  • Example 3: Examples \ref{['example: distance not accurate 1']} and \ref{['example: distance not accurate 2']} revisited
  • Definition 3
  • ...and 8 more