Stream parallel skeleton optimization
Marco Aldinucci, Marco Danelutto
TL;DR
The paper tackles optimizing compositions of stream-parallel skeletons by modeling service time $T_s(\cdot)$ with ideal templates and deriving rewriting rules to produce a normal form, defined as a single farm around sequential code. It proves that any skeleton composition $\Delta$ has an equivalent normal form $\overline{\Delta}$ with $\mathcal{F}[\Delta] = \mathcal{F}[\overline{\Delta}]$ and $T_s(\overline{\Delta}) \le T_s(\Delta)$ under certain timing assumptions, and it substantiates this with empirical results. The main contributions are the normal-form transformation, its formal proof via fringe-based decomposition, and demonstrated performance gains due to reduced overhead and improved load balancing. The findings have practical implications for the design of skeleton-based languages and compilers, suggesting that simple, farm-centric normal-form implementations can outperform more nested, non-normal forms, especially under load imbalance; future work includes analyzing resource requirements and extending the normal-form concept to data/stream skeletons.
Abstract
We discuss the properties of the composition of stream parallel skeletons such as pipelines and farms. By looking at the ideal performance figures assumed to hold for these skeletons, we show that any stream parallel skeleton composition can always be rewritten into an equivalent "normal form" skeleton composition, delivering a service time which is equal or even better to the service time of the original skeleton composition, and achieving a better utilization of the processors used. The normal form is defined as a single farm built around a sequential worker code. Experimental results are discussed that validate this normal form.
