Table of Contents
Fetching ...

A GPU-accelerated Molecular Docking Workflow with Kubernetes and Apache Airflow

Daniel Medeiros, Gabin Schieffer, Jacob Wahlgren, Ivy Peng

TL;DR

This study investigates the transition and deployment of a GPU-accelerated molecular docking workflow that was designed for HPC systems onto a cloud-native environment with Kubernetes and Apache Airflow and provides a DAG-based implementation in Apache Airflow.

Abstract

Complex workflows play a critical role in accelerating scientific discovery. In many scientific domains, efficient workflow management can lead to faster scientific output and broader user groups. Workflows that can leverage resources across the boundary between cloud and HPC are a strong driver for the convergence of HPC and cloud. This study investigates the transition and deployment of a GPU-accelerated molecular docking workflow that was designed for HPC systems onto a cloud-native environment with Kubernetes and Apache Airflow. The case study focuses on state-of-of-the-art molecular docking software for drug discovery. We provide a DAG-based implementation in Apache Airflow and technical details for GPU-accelerated deployment. We evaluated the workflow using the SWEETLEAD bioinformatics dataset and executed it in a Cloud environment with heterogeneous computing resources. Our workflow can effectively overlap different stages when mapped onto different computing resources.

A GPU-accelerated Molecular Docking Workflow with Kubernetes and Apache Airflow

TL;DR

This study investigates the transition and deployment of a GPU-accelerated molecular docking workflow that was designed for HPC systems onto a cloud-native environment with Kubernetes and Apache Airflow and provides a DAG-based implementation in Apache Airflow.

Abstract

Complex workflows play a critical role in accelerating scientific discovery. In many scientific domains, efficient workflow management can lead to faster scientific output and broader user groups. Workflows that can leverage resources across the boundary between cloud and HPC are a strong driver for the convergence of HPC and cloud. This study investigates the transition and deployment of a GPU-accelerated molecular docking workflow that was designed for HPC systems onto a cloud-native environment with Kubernetes and Apache Airflow. The case study focuses on state-of-of-the-art molecular docking software for drug discovery. We provide a DAG-based implementation in Apache Airflow and technical details for GPU-accelerated deployment. We evaluated the workflow using the SWEETLEAD bioinformatics dataset and executed it in a Cloud environment with heterogeneous computing resources. Our workflow can effectively overlap different stages when mapped onto different computing resources.

Paper Structure

This paper contains 15 sections, 9 figures, 1 table.

Figures (9)

  • Figure 1: General architecture of Apache Airflow, and its interaction with a Kubernetes cluster. Arrows represent communication between components.
  • Figure 2: The DAG of an elementary molecular docking workflow: a single ligand is docked onto a single receptor. Resource requirements for each task is either GPU or CPU.
  • Figure 3: A virtual screening process, where a single protein receptor is identified beforehand (Fixed receptor), and millions of ligand molecules are evaluated against the receptor using molecular docking methods.
  • Figure 4: Virtual screening workflow. The ligand dataset is split in fixed-size batches (①). Then, workers perform docking independently on each batch (②). The results are then gathered for all batches, and post-processed to extract relevant domain-specific information (③).
  • Figure 5: Our DAG for the AutoDock-GPU workflow on Apache Airflow. The blue rectangle indicates a task group, whose tasks are executed once for each batch.
  • ...and 4 more figures