Table of Contents
Fetching ...

Efficient Parameter Tuning for a Structure-Based Virtual Screening HPC Application

Bruno Guindani, Davide Gadioli, Roberto Rocco, Danilo Ardagna, Gianluca Palermo

TL;DR

This paper presents two parallel autotuning techniques for constrained optimization in distributed High-Performance Computing (HPC) environments that extend sequential Bayesian Optimization with two parallel asynchronous approaches, and they integrate predictions from Machine Learning (ML) models to help comply with constraints.

Abstract

Virtual screening applications are highly parameterized to optimize the balance between quality and execution performance. While output quality is critical, the entire screening process must be completed within a reasonable time. In fact, a slight reduction in output accuracy may be acceptable when dealing with large datasets. Finding the optimal quality-throughput trade-off depends on the specific HPC system used and should be re-evaluated with each new deployment or significant code update. This paper presents two parallel autotuning techniques for constrained optimization in distributed High-Performance Computing (HPC) environments. These techniques extend sequential Bayesian Optimization (BO) with two parallel asynchronous approaches, and they integrate predictions from Machine Learning (ML) models to help comply with constraints. Our target application is LiGen, a real-world virtual screening software for drug discovery. The proposed methods address two relevant challenges: efficient exploration of the parameter space and performance measurement using domain-specific metrics and procedures. We conduct an experimental campaign comparing the two methods with a popular state-of-the-art autotuner. Results show that our methods find configurations that are, on average, up to 35-42% better than the ones found by the autotuner and the default expert-picked LiGen configuration.

Efficient Parameter Tuning for a Structure-Based Virtual Screening HPC Application

TL;DR

This paper presents two parallel autotuning techniques for constrained optimization in distributed High-Performance Computing (HPC) environments that extend sequential Bayesian Optimization with two parallel asynchronous approaches, and they integrate predictions from Machine Learning (ML) models to help comply with constraints.

Abstract

Virtual screening applications are highly parameterized to optimize the balance between quality and execution performance. While output quality is critical, the entire screening process must be completed within a reasonable time. In fact, a slight reduction in output accuracy may be acceptable when dealing with large datasets. Finding the optimal quality-throughput trade-off depends on the specific HPC system used and should be re-evaluated with each new deployment or significant code update. This paper presents two parallel autotuning techniques for constrained optimization in distributed High-Performance Computing (HPC) environments. These techniques extend sequential Bayesian Optimization (BO) with two parallel asynchronous approaches, and they integrate predictions from Machine Learning (ML) models to help comply with constraints. Our target application is LiGen, a real-world virtual screening software for drug discovery. The proposed methods address two relevant challenges: efficient exploration of the parameter space and performance measurement using domain-specific metrics and procedures. We conduct an experimental campaign comparing the two methods with a popular state-of-the-art autotuner. Results show that our methods find configurations that are, on average, up to 35-42% better than the ones found by the autotuner and the default expert-picked LiGen configuration.

Paper Structure

This paper contains 24 sections, 4 equations, 7 figures, 2 tables, 1 algorithm.

Figures (7)

  • Figure 1: The LiGen application: all the MPI instances implement the same computation pipeline that works on different slabs of the input and output files. The virtual screening stage can offload computation to the node GPU(s).
  • Figure 2: Framework architecture.
  • Figure 3: PAMaliboo technique with $q=3$ parallel workers.
  • Figure 4: Simulated experiments: 10-experiment average of relevant quantities.
  • Figure 5: Simulated experiments: regret plot of one representative experiment.
  • ...and 2 more figures