Reclaiming Idle CPU Cycles on Kubernetes: Sparse-Domain Multiplexing for Concurrent MPI-CFD Simulations

Tianfang Xie

Reclaiming Idle CPU Cycles on Kubernetes: Sparse-Domain Multiplexing for Concurrent MPI-CFD Simulations

Tianfang Xie

Abstract

When MPI-parallel simulations run on shared Kubernetes clusters, conventional CPU scheduling leaves the vast majority of provisioned cycles idle at synchronization barriers. This paper presents a multiplexing framework that reclaims this idle capacity by co-locating multiple simulations on the same cluster. PMPI-based duty-cycle profiling quantifies the per-rank idle fraction; proportional CPU allocation then allows a second simulation to execute concurrently with minimal overhead, yielding 1.77x throughput. A Pareto sweep to N=5 concurrent simulations shows throughput scaling to 3.74x, with a knee at N=3 offering the best efficiency-cost trade-off. An analytical model with a single fitted parameter predicts these gains within +/-4%. A dynamic controller automates the full pipeline, from profiling through In-Place Pod Vertical Scaling (KEP-1287) to packing and fairness monitoring, achieving 3.25x throughput for four simulations without manual intervention or pod restarts. To our knowledge, this is the first CPU application of In-Place Pod Vertical Scaling to running MPI processes. Experiments on an AWS cluster with OpenFOAM CFD confirm that the results hold under both concentric and standard graph-based (Scotch) mesh partitioning.

Reclaiming Idle CPU Cycles on Kubernetes: Sparse-Domain Multiplexing for Concurrent MPI-CFD Simulations

Abstract

Reclaiming Idle CPU Cycles on Kubernetes: Sparse-Domain Multiplexing for Concurrent MPI-CFD Simulations

Abstract

Paper Structure

Table of Contents

Figures (7)