Table of Contents
Fetching ...

Distributed Path Compression for Piecewise Linear Morse-Smale Segmentations and Connected Components

Michael Will, Jonas Lukasczyk, Julien Tierny, Christoph Garth

Abstract

This paper describes the adaptation of a well-scaling parallel algorithm for computing Morse-Smale segmentations based on path compression to a distributed computational setting. Additionally, we extend the algorithm to efficiently compute connected components in distributed structured and unstructured grids, based either on the connectivity of the underlying mesh or a feature mask. Our implementation is seamlessly integrated with the distributed extension of the Topology ToolKit (TTK), ensuring robust performance and scalability. To demonstrate the practicality and efficiency of our algorithms, we conducted a series of scaling experiments on large-scale datasets, with sizes of up to 4096^3 vertices on up to 64 nodes and 768 cores.

Distributed Path Compression for Piecewise Linear Morse-Smale Segmentations and Connected Components

Abstract

This paper describes the adaptation of a well-scaling parallel algorithm for computing Morse-Smale segmentations based on path compression to a distributed computational setting. Additionally, we extend the algorithm to efficiently compute connected components in distributed structured and unstructured grids, based either on the connectivity of the underlying mesh or a feature mask. Our implementation is seamlessly integrated with the distributed extension of the Topology ToolKit (TTK), ensuring robust performance and scalability. To demonstrate the practicality and efficiency of our algorithms, we conducted a series of scaling experiments on large-scale datasets, with sizes of up to 4096^3 vertices on up to 64 nodes and 768 cores.
Paper Structure (22 sections, 1 equation, 8 figures, 3 tables, 3 algorithms)

This paper contains 22 sections, 1 equation, 8 figures, 3 tables, 3 algorithms.

Figures (8)

  • Figure 1: Connected Component extraction for ctBones ttk-data, the magnetic reconnection magnetic_reconnection and the AT complex ttk-data datasets based on a threshold, which characterize bones of the foot, high-density boundaries and low density areas, respectively. Running these computations on multiple nodes allows us to use much larger datasets by using the distributed memory of all the nodes.
  • Figure 2: Illustration of the distributed path compression (DPC) procedure for one connected polyline (top left). The polyline has the shape of a spiral and is distributed on four ranks whose boundaries are shown by red dashed lines. To compute connectivity, every rank needs one layer of ghost vertices (a), dashed nodes and edges). Note, in VTK a vertex can be a ghost vertex in multiple ranks, but every vertex belongs exclusively to a single rank, which is called the vertex owner. The goal of the DPC procedure is to assign to every vertex the largest vertex identifier of its connected component, here $P$. In the first step of DPC, every rank computes a path compression for all its non-ghost vertices (b)). For instance, after this step the $R_3$ assigns to vertex $D$ and $E$ the ghost vertex $F$, and vertex $O$ is pointing towards vertex $P$. For details regarding the path compression on a single rank we refer the reader to the work of Maack et al. maack_parallel_2023 and the summary described in \ref{['sec:asc_desc']}. The next step involves a cross-rank communication in which all ghost vertices retrieve the current pointers of their owners (table, column $P_0$). For example, ghost vertex $A$ is owned by the $R_0$ and is currently pointing towards $B$, so the $R_1$ (which contains $A$ as a ghost vertex) retrieves this assignment. Next, DPC performs a path compression on the ghost vertices (table, columns $P_1, P_2, P_3$). Here, after three iterations all ghost vertices point towards vertex $P$, which is communicated across ranks (c)). Finally, every rank needs to perform one more iteration of a local path compression to correctly update all pointers (d)).
  • Figure 3: Before (left) and after (right) applying the second pass path compression to merge the sub-segmentations in the connected components. How the segmentations are actually merged is not relevant, it just has to be done in a consistent manner. We have chosen that the segmentation whose target has a lower id gets attached to the one with the higher id. In our example dataset, the id-generation is dominated by the y-direction, therefore the remaining labeling is the one whose segment stretches furthest in positive x-direction.
  • Figure 4: Timing for the strong scaling experiments of DPC based on Perlin Noise at a grid size of $512^3$ (left) and $1024^3$ (right) vertices.
  • Figure 5: Illustration of parallel speedup and efficiency for DPC on Perlin Noise at $512^3$ (dashed blue line) and $1024^3$ (solid blue line). The left plot shows the parallel speedup defined as the runtime of one node divided by the runtime of $n$ nodes, perfect scalability is marked with the dashed gray line. The right plot shows the parallel efficiency as the speedup divided by the number of nodes. The plot shows that the distribution step of path compression does not scale well, as more nodes significantly increase the size of the needed communication.
  • ...and 3 more figures