Table of Contents
Fetching ...

Merging Hazy Sets with m-Schemes: A Geometric Approach to Data Visualization

Lukas Silvester Barth, Hannaneh Fahimi, Parvaneh Joharinad, Jürgen Jost, Janis Keck

TL;DR

This work tackles the challenge of visualizing high-dimensional metric data by formalizing a merging framework for dissimilarities through m-schemes to produce an uber-metric representation suitable for 2D embeddings. It extends UMAP by adopting hazy sets and category-theoretic merging, enabling density-aware, globally coherent visualizations that respect local neighborhoods and global geometry. The paper grounds the method in Riemannian geometry and dissimilarity theory, introduces precise constructions (Sm, Ti, m-schemes), and demonstrates IsUMap via toy manifolds and systematic comparisons across multiple merging schemes. Overall, the approach offers a rigorous, flexible toolkit for geometry-aware data visualization with potential improvements in preserving cluster structure and topology in 2D plots.

Abstract

Many machine learning algorithms try to visualize high dimensional metric data in 2D in such a way that the essential geometric and topological features of the data are highlighted. In this paper, we introduce a framework for aggregating dissimilarity functions that arise from locally adjusting a metric through density-aware normalization, as employed in the IsUMap method. We formalize these approaches as m-schemes, a class of methods closely related to t-norms and t-conorms in probabilistic metrics, as well as to composition laws in information theory. These m-schemes provide a flexible and theoretically grounded approach to refining distance-based embeddings.

Merging Hazy Sets with m-Schemes: A Geometric Approach to Data Visualization

TL;DR

This work tackles the challenge of visualizing high-dimensional metric data by formalizing a merging framework for dissimilarities through m-schemes to produce an uber-metric representation suitable for 2D embeddings. It extends UMAP by adopting hazy sets and category-theoretic merging, enabling density-aware, globally coherent visualizations that respect local neighborhoods and global geometry. The paper grounds the method in Riemannian geometry and dissimilarity theory, introduces precise constructions (Sm, Ti, m-schemes), and demonstrates IsUMap via toy manifolds and systematic comparisons across multiple merging schemes. Overall, the approach offers a rigorous, flexible toolkit for geometry-aware data visualization with potential improvements in preserving cluster structure and topology in 2D plots.

Abstract

Many machine learning algorithms try to visualize high dimensional metric data in 2D in such a way that the essential geometric and topological features of the data are highlighted. In this paper, we introduce a framework for aggregating dissimilarity functions that arise from locally adjusting a metric through density-aware normalization, as employed in the IsUMap method. We formalize these approaches as m-schemes, a class of methods closely related to t-norms and t-conorms in probabilistic metrics, as well as to composition laws in information theory. These m-schemes provide a flexible and theoretically grounded approach to refining distance-based embeddings.

Paper Structure

This paper contains 14 sections, 3 theorems, 31 equations, 5 tables.

Key Result

Proposition 3.1

The haziness of a simplex is at least equal to the maximum of the hazinesses of its faces, or simpler: A simplex is at least as hazy as its faces. And all degeneracies of a simplex are as hazy as the simplex.

Theorems & Definitions (9)

  • Definition 2.1
  • Definition 2.2
  • Definition 3.1
  • Proposition 3.1
  • Definition 3.2
  • Theorem 3.1
  • Definition 3.3
  • Definition 3.4
  • Lemma 1