Enabling Scientific Workflow Scheduling Research in Non-Uniform Memory Access Architectures
Aurelio Vivas, Harold Castro
TL;DR
nFlows addresses the challenge of NUMA-aware scheduling for data-intensive scientific workflows on HPC by providing a NUMA-aware runtime that models, simulates, and executes workflows on NUMA-based systems. It extends Min-Min, HEFT, and FIFO with NUMA distance matrices to evaluate scheduling decisions under realistic data-locality constraints and data-movement costs. The paper details the architecture, workflow and system models, input/output formats, and a comprehensive validation framework including scripts and case studies, while acknowledging limitations and outlining future enhancements. The platform enables researchers to study data locality, memory behavior, and in-memory workflow execution, with open-source availability to facilitate reproducibility and broader adoption in HPC scheduling research.
Abstract
Data-intensive scientific workflows increasingly rely on high-performance computing (HPC) systems, complementing traditional Grid and Cloud platforms. However, workflow scheduling on HPC infrastructures remains challenging due to the prevalence of non-uniform memory access (NUMA) architectures. These systems require schedulers to account for data locality not only across distributed environments but also within each node. Modern HPC nodes integrate multiple NUMA domains and heterogeneous memory regions, such as high-bandwidth memory (HBM) and DRAM, and frequently attach accelerators (GPUs or FPGAs) and network interface cards (NICs) to specific NUMA nodes. This design increases the variability of data-access latency and complicates the placement of both tasks and data. Despite these constraints, most workflow scheduling strategies were originally developed for Grid or Cloud environments and rarely incorporate NUMA-aware considerations. To address this gap, this work introduces nFlows, a NUMA-aware Workflow Execution Runtime System that enables the modeling, bare-metal execution, simulation, and validation of scheduling algorithms for data-intensive workflows on NUMA-based HPC systems. The system's design, implementation, and validation methodology are presented. nFlows supports the construction of simulation models and their direct execution on physical systems, enabling studies of NUMA effects on scheduling, the design of NUMA-aware algorithms, the analysis of data-movement behavior, the identification of performance bottlenecks, and the exploration of in-memory workflow execution.
