Tasking framework for Adaptive Speculative Parallel Mesh Generation
Christos Tsolakis, Polykarpos Thomadakis, Nikos Chrisochoides
TL;DR
This work addresses the rising complexity of mesh generation codes and diverse hardware by introducing a generic tasking framework that separates functionality from performance. The front-end provides a hardware-agnostic API for speculative mesh operations, while back-ends (Argobots, TBB, OpenMP) realize load balancing and scheduling. Applied to CDT3D and PODM, the framework yields per-operation speedups, improved scalability at high core counts, and stable mesh quality, demonstrating portability across back-ends and potential for exascale integration. The results support the Telescopic Approach by decoupling thread management from mesh kernels, enabling reusable components and easier future optimizations, with room for online tuning and broader back-end support in future work.
Abstract
Handling the ever-increasing complexity of mesh generation codes along with the intricacies of newer hardware often results in codes that are both difficult to comprehend and maintain. Different facets of codes such as thread management and load balancing are often intertwined, resulting in efficient but highly complex software. In this work, we present a framework which aids in establishing a core principle, deemed separation of concerns, where functionality is separated from performance aspects of various mesh operations. In particular, thread management and scheduling decisions are elevated into a generic and reusable tasking framework. The results indicate that our approach can successfully abstract the load balancing aspects of two case studies, while providing access to a plethora of different execution back-ends. One would expect, this new flexibility to lead to some additional cost. However, for the configurations studied in this work, we observed up to 13% speedup for some meshing operations and up to 5.8% speedup over the entire application runtime compared to hand-optimized code. Moreover, we show that by using different task creation strategies, the overhead compared to straight-forward task execution models can be improved dramatically by as much as 1200% without compromises in portability and functionality.
