Tuning of Vectorization Parameters for Molecular Dynamics Simulations in AutoPas
Luis Gall, Samuel James Newcome, Fabio Alexander Gratl, Markus Mühlhäußer, Manish Kumar Mishra, Hans-Joachim Bungartz
TL;DR
This work studies how the order of loading particle data into SIMD registers affects force calculations in AutoPas and extends AutoPas with runtime, energy-aware auto-tuning to select the fastest vectorization order under changing simulation conditions. By examining multiple neighbor identification algorithms, traversals, and data layouts, the authors show that the optimal vectorization pattern can vary during a simulation and across workloads. Benchmark results demonstrate meaningful speedups and energy differences depending on cutoffs, cluster sizes, and Newton's third law usage, highlighting the necessity of dynamic tuning for performance portability. The findings underscore that runtime-aware vectorization tuning can significantly improve time-to-solution and energy efficiency in MD simulations across diverse architectures and scenarios.
Abstract
Molecular Dynamics simulations can help scientists to gather valuable insights for physical processes on an atomic scale. This work explores various techniques for SIMD vectorization to improve the pairwise force calculation between molecules in the scope of the particle simulation library AutoPas. The focus lies on the order in which particle values are loaded into vector registers to achieve the most optimal performance regarding execution time or energy consumption. As previous work indicates that the optimal MD algorithm can change during runtime, this paper investigates simulation-specific parameters like particle density and the impact of the neighbor identification algorithms, which distinguishes this work from related projects. Furthermore, AutoPas' dynamic tuning mechanism is extended to choose the optimal vectorization order during runtime. The benchmarks show that considering different particle interaction orders during runtime can lead to a considerable performance improvement for the force calculation compared to AutoPas' previous approach.
