Table of Contents
Fetching ...

ChemFit: A concurrent framework for model parametrization

Moritz Sallermann, Amrita Goswami, Hannes Jónsson, Elvar Ö. Jónsson, Jorge R. Espinosa

Abstract

Parameter optimization in computational chemistry and physics often involves objective functions that are expensive to evaluate, noisy, non-differentiable, or composed of heterogeneous contributions originating from separate simulations. Gradient-free and black box optimization algorithms are powerful tools which are particularly well-suited to solving such optimization problems. However, interfacing simulation engines and parameter optimization libraries can be cumbersome, especially if simulations are expensive and need to be run concurrently. Here, we introduce ChemFit, a flexible Python framework for the definition, composition, and massively concurrent evaluation of simulation-based objective functions, which is designed to operate in conjunction with these algorithms. This framework provides abstractions for heterogeneous objective terms, file-based and in-memory quantity evaluation, and explicit control over concurrency across both objective components and parameter guesses. We demonstrate the versatility of ChemFit for different applications such as: (i) determination of Lennard-Jones parameters for liquid Argon from experimental density data over a range in temperature and pressure, using molecular-dynamics simulations, and (ii) the parameterization of a polarizable force-field for H2O against the structure of small ice clusters obtained from density functional theory calculations. These examples illustrate how ChemFit enables scalable, reproducible, and optimizer-agnostic parameter fitting.

ChemFit: A concurrent framework for model parametrization

Abstract

Parameter optimization in computational chemistry and physics often involves objective functions that are expensive to evaluate, noisy, non-differentiable, or composed of heterogeneous contributions originating from separate simulations. Gradient-free and black box optimization algorithms are powerful tools which are particularly well-suited to solving such optimization problems. However, interfacing simulation engines and parameter optimization libraries can be cumbersome, especially if simulations are expensive and need to be run concurrently. Here, we introduce ChemFit, a flexible Python framework for the definition, composition, and massively concurrent evaluation of simulation-based objective functions, which is designed to operate in conjunction with these algorithms. This framework provides abstractions for heterogeneous objective terms, file-based and in-memory quantity evaluation, and explicit control over concurrency across both objective components and parameter guesses. We demonstrate the versatility of ChemFit for different applications such as: (i) determination of Lennard-Jones parameters for liquid Argon from experimental density data over a range in temperature and pressure, using molecular-dynamics simulations, and (ii) the parameterization of a polarizable force-field for H2O against the structure of small ice clusters obtained from density functional theory calculations. These examples illustrate how ChemFit enables scalable, reproducible, and optimizer-agnostic parameter fitting.
Paper Structure (18 sections, 15 equations, 7 figures, 2 tables)

This paper contains 18 sections, 15 equations, 7 figures, 2 tables.

Figures (7)

  • Figure 1: Total time taken to evaluate a combined objective function with ten individual terms ($y$-axis), versus the time it takes to evaluate each individual term ($x$-axis). Different colors correspond to different concurrency schemes. The solid lines serve as visual guides and are linear fits to the data points (evenly weighted in log-log space). The total number of available threads or processes, respectively, was 48.
  • Figure 2: Flowchart depicting the use of ChemFit with LAMMPS to fit to the experimental Argon densities. For each experimental data point, a FileBasedQuantityComputer is created, which handles the creation of an appropriate LAMMPS input script, for each trial parameter set of $\sigma$ and $\varepsilon$. The FileBasedQuantityComputer also uses an output parser to analyze the log files created by LAMMPS to obtain the averaged densities.
  • Figure 3: A) "Trajectory" (brown dots) of the optimization in the space of reduced density ($x$-axis) and reduced temperature ($y$-axis) for one of the 139 experimental measurements of liquid Argon by streett_1969_experimental at $p=340.23$ atm, $T=120.18$ K and with measured density of 1.32 g/cm$^3$. The data is overlaid upon the phase-diagram of the Lennard-Jones system as per stephan_2019_thermophysical The initial guess is marked with a filled red circle and the point with minimal root-mean-square deviation (RMSD) over all 139 experimental points with a green star. B) RMSD versus optimization step. For each step two points were run in tandem, so that the the total number of simulations corresponds to twice the number of steps. The points are colored according to the phase in which the majority of the 139 physical systems lie.
  • Figure 4: Bottom: Results for the RMSD from the Kabsch algorithm Kabsch1976Kabsch1978 for the initial and optimal geometries, compared to reference geometries from DFT. Top: Snapshots of the initial, reference and optimal geometry of the water cage pentamer CAA. Dashed grey lines are a guide for the eye, connecting oxygen atoms that are less than $3.5$ Å apart.
  • Figure 5: Energy per atom, in eV, of the initial, optimal and reference (DFT) geometries of water clusters. Agreement of the optimal geometries with those obtained from DFT calculations with the BEEF-vdW functional is within $0.01$ eV per atom.
  • ...and 2 more figures