Energy efficiency optimization of task-parallel codes on asymmetric architectures
Luis Costero, Francisco D. Igual, Katzalin Olcoz, Francisco Tirado
TL;DR
This work tackles energy efficiency for task-parallel codes on asymmetric ARM big.LITTLE systems using a runtime-driven policy set in Nanox. It introduces two policy families: FS (DVFS-based) and TS (scheduling-based) that modulate cluster frequencies or cluster usage according to the scheduler state, aiming for energy savings with minimal perf impact. On an Exynos 5422 platform with a Cholesky factorization workload, results show FS3 achieves up to 29.3% improvement in energy efficiency, while frequency scaling of the LITTLE cluster reduces power but not always energy efficiency; TS policies offer limited gains, with TS3 reaching up to 17.1% in select configurations. The findings indicate that scaling the big cluster frequency provides the strongest energy-efficiency benefits and motivate further work on broader benchmarks and automatic policy selection.
Abstract
We present a family of policies that, integrated within a runtime task scheduler (Nanox), pursue the goal of improving the energy efficiency of task-parallel executions with no intervention from the programmer. The proposed policies tackle the problem by modifying the core operating frequency via DVFS mechanisms, or by enabling/disabling the mapping of tasks to specific cores at selected execution points, depending on the internal status of the scheduler. Experimental results on an asymmetric SoC (Exynos 5422) and for a specific operation (Cholesky factorization) reveal gains up to 29% in terms of energy efficiency and considerable reductions in average power.
