Table of Contents
Fetching ...

Mixup Barcodes: Quantifying Geometric-Topological Interactions between Point Clouds

Hubert Wagner, Nickolas Arustamyan, Matthew Wheeler, Peter Bubenik

TL;DR

The paper introduces mixup barcodes, a new topological descriptor that couples standard persistence with image persistence to quantify interactions between point clouds. By computing a coordinated mixup decomposition from filtrations of L and K, the authors provide a practical algorithm and software to extract total mixup and mixup percentage as scale-invariant metrics. Applied to embeddings from neural network training, the method reveals meaningful geometric-topological interactions that correlate with disentanglement across layers and data difficulty, offering advantages over conventional persistence in capturing inter-class structure. This work expands topological data analysis toward interaction-aware descriptors and suggests broad applicability in science and engineering domains where spatial relations between components matter. Specifically, it aligns with the Chromatic TDA direction and demonstrates potential for diagnosing training dynamics and guiding regularization in high-dimensional settings.

Abstract

We combine standard persistent homology with image persistent homology to define a novel way of characterizing shapes and interactions between them. In particular, we introduce: (1) a mixup barcode, which captures geometric-topological interactions (mixup) between two point sets in arbitrary dimension; (2) simple summary statistics, total mixup and total percentage mixup, which quantify the complexity of the interactions as a single number; (3) a software tool for playing with the above. As a proof of concept, we apply this tool to a problem arising from machine learning. In particular, we study the disentanglement in embeddings of different classes. The results suggest that topological mixup is a useful method for characterizing interactions for low and high-dimensional data. Compared to the typical usage of persistent homology, the new tool is sensitive to the geometric locations of the topological features, which is often desirable.

Mixup Barcodes: Quantifying Geometric-Topological Interactions between Point Clouds

TL;DR

The paper introduces mixup barcodes, a new topological descriptor that couples standard persistence with image persistence to quantify interactions between point clouds. By computing a coordinated mixup decomposition from filtrations of L and K, the authors provide a practical algorithm and software to extract total mixup and mixup percentage as scale-invariant metrics. Applied to embeddings from neural network training, the method reveals meaningful geometric-topological interactions that correlate with disentanglement across layers and data difficulty, offering advantages over conventional persistence in capturing inter-class structure. This work expands topological data analysis toward interaction-aware descriptors and suggests broad applicability in science and engineering domains where spatial relations between components matter. Specifically, it aligns with the Chromatic TDA direction and demonstrates potential for diagnosing training dynamics and guiding regularization in high-dimensional settings.

Abstract

We combine standard persistent homology with image persistent homology to define a novel way of characterizing shapes and interactions between them. In particular, we introduce: (1) a mixup barcode, which captures geometric-topological interactions (mixup) between two point sets in arbitrary dimension; (2) simple summary statistics, total mixup and total percentage mixup, which quantify the complexity of the interactions as a single number; (3) a software tool for playing with the above. As a proof of concept, we apply this tool to a problem arising from machine learning. In particular, we study the disentanglement in embeddings of different classes. The results suggest that topological mixup is a useful method for characterizing interactions for low and high-dimensional data. Compared to the typical usage of persistent homology, the new tool is sensitive to the geometric locations of the topological features, which is often desirable.
Paper Structure (11 sections, 3 theorems, 3 equations, 8 figures, 2 algorithms)

This paper contains 11 sections, 3 theorems, 3 equations, 8 figures, 2 algorithms.

Key Result

Theorem 1

There is a canonical surjective matching from the barcode of $H_k(L)$ to the barcode of $\mathop{\mathrm{im}}\nolimits(H_k(\iota))$ which matches intervals with the same birth index.

Figures (8)

  • Figure 1: In this example $K$ is a cylinder capped with disks. The dark cells (two circles, one disk and the cylinder) come from $L$, and the light cells (the two disks) come from $K \setminus L$. The new disk at $L_6$ is meant as another cell bounding the circle, distinct from the disk at $K_4$ (consistent with our assumptions). The mixup triples of $L \hookrightarrow K$ in degree $1$ are $((1,4,6), (2,3,5))$, as illustrated with the mixup barcode plotted above. In particular, the mixup sub-bars $([4,6), [3,5))$ quantify the shortening of the lifetime of the two cycles. The kernel persistence bars are $([3,6), [4,5))$ are different and do not correspond to the shortening of the lifetimes of the cycles.
  • Figure 2: This example is presented differently. The top filtration is $L$ and the bottom filtration is $K$. The $1$-cells are half-circles and a segment, and the $2$-cells are half-disks. We assume that the cells added to $L_6$ and $L_7$ are distinct from the cells added in $K_5$ and $K_4$. We consider the mixup barcode of $L \hookrightarrow K$ in degree $1$.
  • Figure 3: Top: Two point clouds: $R$ red (dark) and $B$ blue (light). Each samples two objects embedded in $\mathbb{R}^3$ with nontrivial topology interacting in nontrivial ways. Bottom: Mixup barcodes of $R$ and $B$ included in their union, in degrees $2,1,0$. Bottom left: In $H_2$, we see a prominent persistence bar with a long mixup bar. It detects the red void partially filled by blue points, interpreted as the red object surrounding a part of the blue object. Similarly in $H_1$ the prominent mixup bar detects the red handle (tunnel) that encircles the blue points. No significant mixup occurs in $H_0$, although the many shorter mixup bars detect the overlap between the two shapes. Bottom right: In $H_1$ the mixup barcode detects the handle encircled by the red object, with the late birth indicating that a large part is missing. The prominent bar for $H_0$ detects that the two blue connected components connect quicker via the red points. We interpret this as separation of the two blue parts by the red object. As expected, there is no prominent persistence or mixup in $H_2$.
  • Figure 4: The three point clouds are the (soft) predictions produced by a trained machine learning model for three different classes of examples. The trajectories track the predictions during training, starting from a random state. We stress this does not show the disentanglement process directly, as it occurs in high dimensions as the data passes through the layers. It does show the evolution of the output of the last layer.
  • Figure 5: Left: The mean mixup percentage between all pairs of MNIST classes. Most values are low, as expected due to linear separability. Right: The mixup barcode between the images of fours and nines which achieve the greatest mixup. Many bars with small -- but positive -- mixup suggest that the two point clouds overlap along a long but shallow interface.
  • ...and 3 more figures

Theorems & Definitions (13)

  • Theorem 1: Matching bauer2015induced
  • Definition 2: Mixup barcode
  • Definition 3: Mixup triple
  • Definition 4: Persistence bar and image and mixup sub-bars
  • Definition 5: Premature death
  • Definition 6: Mixup
  • Definition 7: Total mixup
  • Definition 8: Mixup percentage
  • Definition 9: Total mixup percentage
  • Definition 10: Successor rule
  • ...and 3 more