A Distance for Geometric Graphs via the Labeled Merge Tree Interleaving Distance
Erin Wolf Chambers, Elizabeth Munch, Sarah Percival, Xinyi Wang
TL;DR
This work defines a distance between embedded graphs by transforming each graph into a family of labeled merge trees via a directional transform and integrating a labeled interleaving distance across all directions in $S^1$. The approach relies on a surjective labeling scheme that pairs extremal vertices with closest points on the other graph, enabling a computable distance $D(G_1,G_2)$ based on the matrices induced by the merge trees. The authors provide both exact (kinetic data structure) and approximate (direction sampling) computation methods, analyze theoretical properties (finiteness, symmetry, non-metric nature, and continuity), and demonstrate the method on datasets including Passiflora leaves and letter graphs. The resulting distance captures embedding-sensitive geometric differences and yields meaningful clustering and visualization outcomes, with practical implementation available publicly. The work highlights future directions such as stabilizing the labeling, extending to higher dimensions, and scaling to larger datasets.
Abstract
Geometric graphs appear in many real-world data sets, such as road networks, sensor networks, and molecules. We investigate the notion of distance between embedded graphs and present a metric to measure the distance between two geometric graphs via merge trees. In order to preserve as much useful information as possible from the original data, we introduce a way of rotating the sublevel set to obtain the merge trees via the idea of the directional transform. We represent the merge trees using a surjective multi-labeling scheme and then compute the distance between two representative matrices. We show some theoretically desirable qualities and present two methods of computation: approximation via sampling and exact distance using a kinetic data structure, both in polynomial time. We illustrate its utility by implementing it on two data sets.
