GROMACS Unplugged: How Power Capping and Frequency Shapes Performance on GPUs
Ayesha Afzal, Anna Kahler, Georg Hager, Gerhard Wellein
TL;DR
This study analyzes how GPU clock frequency and power capping influence GROMACS performance on four NVIDIA GPUs (A40, A100, L4, L40) using six biomolecular benchmarks and two synthetic workloads (Pi Solver and BabelStream). By combining frequency-tuning experiments with power-cap analyses, the authors map throughput in MD workloads (ns/day) across architectures, revealing that small systems are highly frequency-sensitive while large systems become memory bandwidth-limited; high-end GPUs like the A100 maintain near-peak performance under reasonable power caps. The Pi Solver demonstrates ideal compute scalability with frequency, whereas BabelStream highlights memory-bound limits, providing context for interpreting MD performance. The results offer practical guidance for hardware selection and tuning in large-scale MD workflows under power constraints and establish a reproducible benchmarking framework for energy-aware MD computing.
Abstract
Molecular dynamics simulations are essential tools in computational biophysics, but their performance depend heavily on hardware choices and configuration. In this work, we presents a comprehensive performance analysis of four NVIDIA GPU accelerators -- A40, A100, L4, and L40 -- using six representative GROMACS biomolecular workloads alongside two synthetic benchmarks: Pi Solver (compute bound) and STREAM Triad (memory bound). We investigate how performance scales with GPU graphics clock frequency and how workloads respond to power capping. The two synthetic benchmarks define the extremes of frequency scaling: Pi Solver shows ideal compute scalability, while STREAM Triad reveals memory bandwidth limits -- framing GROMACS's performance in context. Our results reveal distinct frequency scaling behaviors: Smaller GROMACS systems exhibit strong frequency sensitivity, while larger systems saturate quickly, becoming increasingly memory bound. Under power capping, performance remains stable until architecture- and workload-specific thresholds are reached, with high-end GPUs like the A100 maintaining near-maximum performance even under reduced power budgets. Our findings provide practical guidance for selecting GPU hardware and optimizing GROMACS performance for large-scale MD workflows under power constraints.
