Table of Contents
Fetching ...

Response Matching for generating materials and molecules

Bingqing Cheng

TL;DR

This work addresses the challenge of generating chemically and structurally valid materials and molecules while respecting locality, permutation, translation, rotation, and PBC invariances. It introduces Response Matching (RM), a diffusion-like denoising framework where a machine-learned interatomic potential predicts fictitious forces and stresses in response to coordinate noise, guiding relaxation via a pseudo-energy surface $ ilde{E}$. RM unifies molecular and bulk-material generation under a single, locality-aware paradigm and demonstrates effectiveness across QM7b, Materials Project structures, and one-shot learning from a single diamond datum, achieving reliable structure generation and useful screening signals through $ ilde{E}$. The approach promises efficient, scalable generation and rapid screening, with potential extensions to property conditioning, space-group priors, and alchemical element swaps, enabling accelerated discovery in materials and molecular design. All mathematical expressions are presented with proper delimiters, e.g., $L_{\\lambda}$ and $E = \sum_i E_i$.

Abstract

Machine learning has recently emerged as a powerful tool for generating new molecular and material structures. The success of state-of-the-art models stems from their ability to incorporate physical symmetries, such as translation, rotation, and periodicity. Here, we present a novel generative method called Response Matching (RM), which leverages the fact that each stable material or molecule exists at the minimum of its potential energy surface. Consequently, any perturbation induces a response in energy and stress, driving the structure back to equilibrium. Matching to such response is closely related to score matching in diffusion models. By employing the combination of a machine learning interatomic potential and random structure search as the denoising model, RM exploits the locality of atomic interactions, and inherently respects permutation, translation, rotation, and periodic invariances. RM is the first model to handle both molecules and bulk materials under the same framework. We demonstrate the efficiency and generalization of RM across three systems: a small organic molecular dataset, stable crystals from the Materials Project, and one-shot learning on a single diamond configuration.

Response Matching for generating materials and molecules

TL;DR

This work addresses the challenge of generating chemically and structurally valid materials and molecules while respecting locality, permutation, translation, rotation, and PBC invariances. It introduces Response Matching (RM), a diffusion-like denoising framework where a machine-learned interatomic potential predicts fictitious forces and stresses in response to coordinate noise, guiding relaxation via a pseudo-energy surface . RM unifies molecular and bulk-material generation under a single, locality-aware paradigm and demonstrates effectiveness across QM7b, Materials Project structures, and one-shot learning from a single diamond datum, achieving reliable structure generation and useful screening signals through . The approach promises efficient, scalable generation and rapid screening, with potential extensions to property conditioning, space-group priors, and alchemical element swaps, enabling accelerated discovery in materials and molecular design. All mathematical expressions are presented with proper delimiters, e.g., and .

Abstract

Machine learning has recently emerged as a powerful tool for generating new molecular and material structures. The success of state-of-the-art models stems from their ability to incorporate physical symmetries, such as translation, rotation, and periodicity. Here, we present a novel generative method called Response Matching (RM), which leverages the fact that each stable material or molecule exists at the minimum of its potential energy surface. Consequently, any perturbation induces a response in energy and stress, driving the structure back to equilibrium. Matching to such response is closely related to score matching in diffusion models. By employing the combination of a machine learning interatomic potential and random structure search as the denoising model, RM exploits the locality of atomic interactions, and inherently respects permutation, translation, rotation, and periodic invariances. RM is the first model to handle both molecules and bulk materials under the same framework. We demonstrate the efficiency and generalization of RM across three systems: a small organic molecular dataset, stable crystals from the Materials Project, and one-shot learning on a single diamond configuration.
Paper Structure (13 sections, 12 equations, 5 figures)

This paper contains 13 sections, 12 equations, 5 figures.

Figures (5)

  • Figure 1: The comparison between the actual atomization energy ($E_{at}$) and the pseudo energy ($\Tilde{E}$) predicted by the RM model for small molecules in the QM7b data set. a shows the comparison of the energies per molecule. b shows the comparison of the energies per atom for the molecules with most common compositions in QM7b. The Pearson correlation coefficients $R$ are provided in the legends.
  • Figure 2: Illustrations of small molecules generated using the RM model. a shows the percentage of the molecules with given composition that passes the chemical feasibility checks using PoseBusters buttenschoen2024posebusters. The last column indicates the percentage that passes all the checks. b contains selected molecular configurations. The carbon, oxygen, nitrogen, sulfur, chlorine, and hydrogen atoms are colored using black, red, blue, yellow, green, and white, repectively.
  • Figure 3: The similarity of chemical elements visualized using the first two principal components (PCs) of the CACE embedding matrix $\theta$ in the RM denoising model. Each element is colored according to its chemical group. The size of the symbol indicate the size of the elements. The noble gas element Ne is outside the plot.
  • Figure 4: The pseudo convex hull of the generated Li-S structures at different Li fractions. The colored markers denote the generated structures that matched with the known structures in the Materials Project jain2013commentary using StructureMatcher from pymatgen ong2013python, and the materials IDs are given in the legend. The filled (hollow) symbols indicate structures that are in (not in) the training set.
  • Figure 5: The pseudo energy of the generated carbon structures with different molar volumes. The blue markers represent cubic diamond structures, the red markers indicate hexagonal diamonds, and the purple markers denote diamonds with stacking faults. Graphite structures lie along the band pointed to by the black line.