Table of Contents
Fetching ...

Tasking framework for Adaptive Speculative Parallel Mesh Generation

Christos Tsolakis, Polykarpos Thomadakis, Nikos Chrisochoides

TL;DR

This work addresses the rising complexity of mesh generation codes and diverse hardware by introducing a generic tasking framework that separates functionality from performance. The front-end provides a hardware-agnostic API for speculative mesh operations, while back-ends (Argobots, TBB, OpenMP) realize load balancing and scheduling. Applied to CDT3D and PODM, the framework yields per-operation speedups, improved scalability at high core counts, and stable mesh quality, demonstrating portability across back-ends and potential for exascale integration. The results support the Telescopic Approach by decoupling thread management from mesh kernels, enabling reusable components and easier future optimizations, with room for online tuning and broader back-end support in future work.

Abstract

Handling the ever-increasing complexity of mesh generation codes along with the intricacies of newer hardware often results in codes that are both difficult to comprehend and maintain. Different facets of codes such as thread management and load balancing are often intertwined, resulting in efficient but highly complex software. In this work, we present a framework which aids in establishing a core principle, deemed separation of concerns, where functionality is separated from performance aspects of various mesh operations. In particular, thread management and scheduling decisions are elevated into a generic and reusable tasking framework. The results indicate that our approach can successfully abstract the load balancing aspects of two case studies, while providing access to a plethora of different execution back-ends. One would expect, this new flexibility to lead to some additional cost. However, for the configurations studied in this work, we observed up to 13% speedup for some meshing operations and up to 5.8% speedup over the entire application runtime compared to hand-optimized code. Moreover, we show that by using different task creation strategies, the overhead compared to straight-forward task execution models can be improved dramatically by as much as 1200% without compromises in portability and functionality.

Tasking framework for Adaptive Speculative Parallel Mesh Generation

TL;DR

This work addresses the rising complexity of mesh generation codes and diverse hardware by introducing a generic tasking framework that separates functionality from performance. The front-end provides a hardware-agnostic API for speculative mesh operations, while back-ends (Argobots, TBB, OpenMP) realize load balancing and scheduling. Applied to CDT3D and PODM, the framework yields per-operation speedups, improved scalability at high core counts, and stable mesh quality, demonstrating portability across back-ends and potential for exascale integration. The results support the Telescopic Approach by decoupling thread management from mesh kernels, enabling reusable components and easier future optimizations, with room for online tuning and broader back-end support in future work.

Abstract

Handling the ever-increasing complexity of mesh generation codes along with the intricacies of newer hardware often results in codes that are both difficult to comprehend and maintain. Different facets of codes such as thread management and load balancing are often intertwined, resulting in efficient but highly complex software. In this work, we present a framework which aids in establishing a core principle, deemed separation of concerns, where functionality is separated from performance aspects of various mesh operations. In particular, thread management and scheduling decisions are elevated into a generic and reusable tasking framework. The results indicate that our approach can successfully abstract the load balancing aspects of two case studies, while providing access to a plethora of different execution back-ends. One would expect, this new flexibility to lead to some additional cost. However, for the configurations studied in this work, we observed up to 13% speedup for some meshing operations and up to 5.8% speedup over the entire application runtime compared to hand-optimized code. Moreover, we show that by using different task creation strategies, the overhead compared to straight-forward task execution models can be improved dramatically by as much as 1200% without compromises in portability and functionality.
Paper Structure (22 sections, 15 figures, 4 tables)

This paper contains 22 sections, 15 figures, 4 tables.

Figures (15)

  • Figure 1: Pseudocodes of the speculative approach applied to a Delaunay-based algorithm (left) and a local reconnection operation (right) of an Advancing-Front method. Colored regions indicate the primal function of the enclosed steps.
  • Figure 2: Different tasking paradigms employed in this work. Left: flat model, Middle: two-level task creation, Right: hierarchical task creation.
  • Figure 3: Mesh operations in CDT3D.
  • Figure 4: Left: Normalized total running time of high level constructs and the flat model. Right: zoom-in at the range $0.5$-$2.0$.
  • Figure 5: Left: Normalized total running of the three task creation strategies implemented across the three different back-ends. The grainsize is fixed to $1$. Right: zoom-in at the range $0.6$-$2.0$.
  • ...and 10 more figures