Table of Contents
Fetching ...

PAL -- Parallel active learning for machine-learned potentials

Chen Zhou, Marlen Neubert, Yuri Koide, Yumeng Zhang, Van-Quan Vuong, Tobias Schlöder, Stefanie Dehnen, Pascal Friederich

TL;DR

PAL introduces a scalable, MPI-based framework for parallel active learning that modularizes the workflow into five kernels (prediction, generator, oracle, training, controller) to enable asynchronous data generation, labeling, and model updates on HPC systems. By decoupling these components and enabling concurrent execution, PAL achieves improved resource utilization and reduced overhead across CPU and GPU hardware. The authors validate PAL on four real-world scenarios—photodynamics, hydrogen atom transfer, inorganic clusters, and thermo-fluid flow optimization—demonstrating adaptable kernel configurations and substantial speedups, while highlighting bottlenecks and future enhancements. Overall, PAL provides an open, extensible platform to accelerate the development of high-fidelity ML potentials and related scientific models with minimal manual intervention.

Abstract

Constructing datasets representative of the target domain is essential for training effective machine learning models. Active learning (AL) is a promising method that iteratively extends training data to enhance model performance while minimizing data acquisition costs. However, current AL workflows often require human intervention and lack parallelism, leading to inefficiencies and underutilization of modern computational resources. In this work, we introduce PAL, an automated, modular, and parallel active learning library that integrates AL tasks and manages their execution and communication on shared- and distributed-memory systems using the Message Passing Interface (MPI). PAL provides users with the flexibility to design and customize all components of their active learning scenarios, including machine learning models with uncertainty estimation, oracles for ground truth labeling, and strategies for exploring the target space. We demonstrate that PAL significantly reduces computational overhead and improves scalability, achieving substantial speed-ups through asynchronous parallelization on CPU and GPU hardware. Applications of PAL to several real-world scenarios - including ground-state reactions in biomolecular systems, excited-state dynamics of molecules, simulations of inorganic clusters, and thermo-fluid dynamics - illustrate its effectiveness in accelerating the development of machine learning models. Our results show that PAL enables efficient utilization of high-performance computing resources in active learning workflows, fostering advancements in scientific research and engineering applications.

PAL -- Parallel active learning for machine-learned potentials

TL;DR

PAL introduces a scalable, MPI-based framework for parallel active learning that modularizes the workflow into five kernels (prediction, generator, oracle, training, controller) to enable asynchronous data generation, labeling, and model updates on HPC systems. By decoupling these components and enabling concurrent execution, PAL achieves improved resource utilization and reduced overhead across CPU and GPU hardware. The authors validate PAL on four real-world scenarios—photodynamics, hydrogen atom transfer, inorganic clusters, and thermo-fluid flow optimization—demonstrating adaptable kernel configurations and substantial speedups, while highlighting bottlenecks and future enhancements. Overall, PAL provides an open, extensible platform to accelerate the development of high-fidelity ML potentials and related scientific models with minimal manual intervention.

Abstract

Constructing datasets representative of the target domain is essential for training effective machine learning models. Active learning (AL) is a promising method that iteratively extends training data to enhance model performance while minimizing data acquisition costs. However, current AL workflows often require human intervention and lack parallelism, leading to inefficiencies and underutilization of modern computational resources. In this work, we introduce PAL, an automated, modular, and parallel active learning library that integrates AL tasks and manages their execution and communication on shared- and distributed-memory systems using the Message Passing Interface (MPI). PAL provides users with the flexibility to design and customize all components of their active learning scenarios, including machine learning models with uncertainty estimation, oracles for ground truth labeling, and strategies for exploring the target space. We demonstrate that PAL significantly reduces computational overhead and improves scalability, achieving substantial speed-ups through asynchronous parallelization on CPU and GPU hardware. Applications of PAL to several real-world scenarios - including ground-state reactions in biomolecular systems, excited-state dynamics of molecules, simulations of inorganic clusters, and thermo-fluid dynamics - illustrate its effectiveness in accelerating the development of machine learning models. Our results show that PAL enables efficient utilization of high-performance computing resources in active learning workflows, fostering advancements in scientific research and engineering applications.

Paper Structure

This paper contains 24 sections, 9 equations, 4 figures, 1 table.

Figures (4)

  • Figure 1: Comparison of (a) a conventional (serial) active learning and (b) our parallel active learning workflow PAL. a) Classical active learning workflow, in which different tasks , i.e. exploration of the input space using generation and prediction kernels, labeling of the samples using the oracle kernel, and training of the ML model, are performed iteratively and sequentially. b) PAL modularizes, decouples, and parallelizes data generation, oracle labeling, and ML training processes.
  • Figure 2: The computational architecture of the PAL workflow. Multiple boxes indicate parallelization of multiple instances of each kernel. The arrows illustrate information flow between the kernels orchestrated by the two controller sub-kernels. One dedicated controller sub-kernel ensures high-frequency communication between generation and prediction kernels.
  • Figure 3: Examples of PAL applications. a) Photodynamics simulations. b) Hydrogen atom transfer reaction simulations. c) Atomistic simulation of inorganic clusters. d) Thermo-fluid flow properties optimization.
  • Figure 4: Methods and data flow of PAL.