Table of Contents
Fetching ...

Stream parallel skeleton optimization

Marco Aldinucci, Marco Danelutto

TL;DR

The paper tackles optimizing compositions of stream-parallel skeletons by modeling service time $T_s(\cdot)$ with ideal templates and deriving rewriting rules to produce a normal form, defined as a single farm around sequential code. It proves that any skeleton composition $\Delta$ has an equivalent normal form $\overline{\Delta}$ with $\mathcal{F}[\Delta] = \mathcal{F}[\overline{\Delta}]$ and $T_s(\overline{\Delta}) \le T_s(\Delta)$ under certain timing assumptions, and it substantiates this with empirical results. The main contributions are the normal-form transformation, its formal proof via fringe-based decomposition, and demonstrated performance gains due to reduced overhead and improved load balancing. The findings have practical implications for the design of skeleton-based languages and compilers, suggesting that simple, farm-centric normal-form implementations can outperform more nested, non-normal forms, especially under load imbalance; future work includes analyzing resource requirements and extending the normal-form concept to data/stream skeletons.

Abstract

We discuss the properties of the composition of stream parallel skeletons such as pipelines and farms. By looking at the ideal performance figures assumed to hold for these skeletons, we show that any stream parallel skeleton composition can always be rewritten into an equivalent "normal form" skeleton composition, delivering a service time which is equal or even better to the service time of the original skeleton composition, and achieving a better utilization of the processors used. The normal form is defined as a single farm built around a sequential worker code. Experimental results are discussed that validate this normal form.

Stream parallel skeleton optimization

TL;DR

The paper tackles optimizing compositions of stream-parallel skeletons by modeling service time with ideal templates and deriving rewriting rules to produce a normal form, defined as a single farm around sequential code. It proves that any skeleton composition has an equivalent normal form with and under certain timing assumptions, and it substantiates this with empirical results. The main contributions are the normal-form transformation, its formal proof via fringe-based decomposition, and demonstrated performance gains due to reduced overhead and improved load balancing. The findings have practical implications for the design of skeleton-based languages and compilers, suggesting that simple, farm-centric normal-form implementations can outperform more nested, non-normal forms, especially under load imbalance; future work includes analyzing resource requirements and extending the normal-form concept to data/stream skeletons.

Abstract

We discuss the properties of the composition of stream parallel skeletons such as pipelines and farms. By looking at the ideal performance figures assumed to hold for these skeletons, we show that any stream parallel skeleton composition can always be rewritten into an equivalent "normal form" skeleton composition, delivering a service time which is equal or even better to the service time of the original skeleton composition, and achieving a better utilization of the processors used. The normal form is defined as a single farm built around a sequential worker code. Experimental results are discussed that validate this normal form.
Paper Structure (10 sections, 1 equation, 3 figures)

This paper contains 10 sections, 1 equation, 3 figures.

Figures (3)

  • Figure 1: Rewriting rules
  • Figure 2: Normal form vs. non-normal forms. Table A: optimal number of processing elements for each run Table B: same number of processing elements for each run ($T_s$: service time, $T_c$: completion time, $\#PE$: number of processing elements used, $\epsilon$: efficiency)
  • Figure 3: Experimental results: service time (in seconds) vs. number of processing elements used (left) service times (in seconds) vs. variance of sequential skeleton time (right)