Table of Contents
Fetching ...

Making PLUMED fly: a tutorial on optimizing performance

Daniele Rapetti, Massimiliano Bonomi, Carlo Camilloni, Giovanni Bussi, Gareth A. Tribello

TL;DR

This work addresses the need for robust, reproducible performance assessment of PLUMED as it handles increasingly heavy calculations. It introduces the plumed benchmark tool and showcases vectorized, linked-list-based, and parallelization-enabled optimizations across distances, angles, torsions, symmetry functions, and Steinhardt parameters, with practical guidance on using MASK, D_MAX, and OpenMP/MPI tuning. The key contributions are a reliable benchmarking workflow, detailed performance benchmarks, and a set of optimization strategies that can be re-implemented by others, along with advocacy for integrating benchmarking into development workflows. The practical impact is a transparent, portable framework for evaluating and improving PLUMED performance across hardware and software environments, potentially aided by automated pipelines and modern numerical backends.

Abstract

PLUMED is an open-source software package that is widely used for analyzing and enhancing molecular dynamics simulations that works in conjunction with most available molecular dynamics softwares. While the computational cost of PLUMED calculations is typically negligible compared to the molecular dynamics code's force evaluation, the software is increasingly being employed for more computationally demanding tasks where performance optimization becomes critical. In this tutorial, we describe a recently implemented tool that can be used to reliably measure code performance. We then use this tool to generate detailed performance benchmarks that show how calculations of large-numbers of distances, angles or torsions can be optimized by using vector-based commands rather than individual scalar operations. We then present benchmarks that illustrate how to optimize calculations of atomic order parameters and secondary structure variables. Throughout the tutorial and in our implementations we endeavor to explain the algorithmic tricks that are being used to optimize the calculations so others can make use of these prescriptions both when they are using PLUMED and when they are writing their own codes.

Making PLUMED fly: a tutorial on optimizing performance

TL;DR

This work addresses the need for robust, reproducible performance assessment of PLUMED as it handles increasingly heavy calculations. It introduces the plumed benchmark tool and showcases vectorized, linked-list-based, and parallelization-enabled optimizations across distances, angles, torsions, symmetry functions, and Steinhardt parameters, with practical guidance on using MASK, D_MAX, and OpenMP/MPI tuning. The key contributions are a reliable benchmarking workflow, detailed performance benchmarks, and a set of optimization strategies that can be re-implemented by others, along with advocacy for integrating benchmarking into development workflows. The practical impact is a transparent, portable framework for evaluating and improving PLUMED performance across hardware and software environments, potentially aided by automated pipelines and modern numerical backends.

Abstract

PLUMED is an open-source software package that is widely used for analyzing and enhancing molecular dynamics simulations that works in conjunction with most available molecular dynamics softwares. While the computational cost of PLUMED calculations is typically negligible compared to the molecular dynamics code's force evaluation, the software is increasingly being employed for more computationally demanding tasks where performance optimization becomes critical. In this tutorial, we describe a recently implemented tool that can be used to reliably measure code performance. We then use this tool to generate detailed performance benchmarks that show how calculations of large-numbers of distances, angles or torsions can be optimized by using vector-based commands rather than individual scalar operations. We then present benchmarks that illustrate how to optimize calculations of atomic order parameters and secondary structure variables. Throughout the tutorial and in our implementations we endeavor to explain the algorithmic tricks that are being used to optimize the calculations so others can make use of these prescriptions both when they are using PLUMED and when they are writing their own codes.

Paper Structure

This paper contains 11 sections, 9 equations, 15 figures.

Figures (15)

  • Figure 1: Interaction of PLUMED with molecular dynamics codes. PLUMED is able to calculate functions of the atomic positions and apply forces to atoms by passing data to and from the underlying MD code.
  • Figure 2: Data communications between PLUMED actions. The left panel illustrates how the constituent actions in the first example input in this paper evaluate the bias function. The right panel shows how data is passed between actions when forces are evaluated using the chain rule. These figures were made by using the PLUMED command plumed show_graph, which outputs a Mermaid diagram.
  • Figure 3: Time per step as a function of the number of distances that are being computed. The blue, orange, light and dark lines indicate the cost of running the calculation with 1, 2, 4 and 8 OpenMP threads respectively. The top panel indicates the cost of calculating the distances only, while the bottom panel indicates the additional cost that comes if you apply a force on the computed distances and also need to calculate derivatives.
  • Figure 4: Time taken for a single PLUMED step as a function of the number of distances (left), angles (center) and torsions (right) that are being computed. Cost for just calculating these quantities (top panels). Cost for calculating and applying a force on the variables (bottom panels). Calculations were run on 1 - 32 OpenMP threads. The legend indicates what number of threads was used to produce each of the lines.
  • Figure 5: Time taken for a single PLUMED step as as a function of the number of torsions that are being computed. The top panels show how the cost of calculating the torsions increases while the bottom panel shows how the cost of calculating the torsions and applying a force on these quantities changes. All the calculations that were used to generate the solid lines for graphs in the left column were run on 8 processors. For the blue line all processors communicated via OpenMP, while the orange line shows the result that was obtained when communication between the 8 processors was managed using MPI. The green and red lines show the results obtained when the two communication protocols are mixed. The red line shows timings that are are obtained by having four MPI processors that each run on two OpenMP threads, while the green line indicates the result that is obtained by having two MPI processors running on four OpenMP threads each. The orange and green dashed lines are results obtained when you run with 2 and 4 OpenMP threads respectively. The lines on the graphs in the right column were obtained from calculations that were parallelized over 1 to 32 MPI processes.
  • ...and 10 more figures