Table of Contents
Fetching ...

Materials Discovery with Extreme Properties via Reinforcement Learning-Guided Combinatorial Chemistry

Hyunseung Kim, Haeyeon Choi, Dongju Kang, Won Bo Lee, Jonggeol Na

TL;DR

The paper tackles materials extrapolation by introducing RL-guided combinatorial chemistry (RL-CC), a BRICS-based, policy-driven molecular generator trained with PPO to assemble target molecules from fragments. It theoretically argues that distribution-learning models struggle to extrapolate beyond training data and empirically shows RL-CC can uncover extreme-property molecules, including multi-target hits and drug-like candidates, while maintaining 100% chemical validity. The approach demonstrates practical impact through protein docking and HIV inhibitor applications and explores expansion to organic materials, while acknowledging limitations such as target-dependent retraining and sparse rewards, with proposed future directions like meta-learning and hierarchical reinforcement learning. Overall, RL-CC provides a principled route to materials discovery beyond observed data distributions by coupling rule-based chemistry with learned fragment-selection policies.

Abstract

The goal of most materials discovery is to discover materials that are superior to those currently known. Fundamentally, this is close to extrapolation, which is a weak point for most machine learning models that learn the probability distribution of data. Herein, we develop reinforcement learning-guided combinatorial chemistry, which is a rule-based molecular designer driven by trained policy for selecting subsequent molecular fragments to get a target molecule. Since our model has the potential to generate all possible molecular structures that can be obtained from combinations of molecular fragments, unknown molecules with superior properties can be discovered. We theoretically and empirically demonstrate that our model is more suitable for discovering better compounds than probability distribution-learning models. In an experiment aimed at discovering molecules that hit seven extreme target properties, our model discovered 1,315 of all target-hitting molecules and 7,629 of five target-hitting molecules out of 100,000 trials, whereas the probability distribution-learning models failed. Moreover, it has been confirmed that every molecule generated under the binding rules of molecular fragments is 100% chemically valid. To illustrate the performance in actual problems, we also demonstrate that our models work well on two practical applications: discovering protein docking molecules and HIV inhibitors.

Materials Discovery with Extreme Properties via Reinforcement Learning-Guided Combinatorial Chemistry

TL;DR

The paper tackles materials extrapolation by introducing RL-guided combinatorial chemistry (RL-CC), a BRICS-based, policy-driven molecular generator trained with PPO to assemble target molecules from fragments. It theoretically argues that distribution-learning models struggle to extrapolate beyond training data and empirically shows RL-CC can uncover extreme-property molecules, including multi-target hits and drug-like candidates, while maintaining 100% chemical validity. The approach demonstrates practical impact through protein docking and HIV inhibitor applications and explores expansion to organic materials, while acknowledging limitations such as target-dependent retraining and sparse rewards, with proposed future directions like meta-learning and hierarchical reinforcement learning. Overall, RL-CC provides a principled route to materials discovery beyond observed data distributions by coupling rule-based chemistry with learned fragment-selection policies.

Abstract

The goal of most materials discovery is to discover materials that are superior to those currently known. Fundamentally, this is close to extrapolation, which is a weak point for most machine learning models that learn the probability distribution of data. Herein, we develop reinforcement learning-guided combinatorial chemistry, which is a rule-based molecular designer driven by trained policy for selecting subsequent molecular fragments to get a target molecule. Since our model has the potential to generate all possible molecular structures that can be obtained from combinations of molecular fragments, unknown molecules with superior properties can be discovered. We theoretically and empirically demonstrate that our model is more suitable for discovering better compounds than probability distribution-learning models. In an experiment aimed at discovering molecules that hit seven extreme target properties, our model discovered 1,315 of all target-hitting molecules and 7,629 of five target-hitting molecules out of 100,000 trials, whereas the probability distribution-learning models failed. Moreover, it has been confirmed that every molecule generated under the binding rules of molecular fragments is 100% chemically valid. To illustrate the performance in actual problems, we also demonstrate that our models work well on two practical applications: discovering protein docking molecules and HIV inhibitors.
Paper Structure (18 sections, 8 equations, 8 figures, 3 tables)

This paper contains 18 sections, 8 equations, 8 figures, 3 tables.

Figures (8)

  • Figure 1: Probability distribution-learning models for molecular generation. (a) Data distribution of logP-TPSA. The pink dots denote the molecules in MOSES polykovskiy2020moses training data. The other colored dots denote the molecules generated by MOSES baseline models which were trained with the MOSES training data. Since the MOSES baseline models are probability distribution-learning models such as NMT, GAN, VAE, and AAE, the distribution of generated molecules approximates the distribution of their training data. The magenta triangles and blue diamonds indicate real molecules in ChEMBL mendez2019chembl database, which have extrapolated properties from MOSES training data distribution. ① CHEMBL3216345; ② CHEMBL3230084; ③ CHEMBL3358630; ④ CHEMBL300801; ⑤ CHEMBL501130; ⑥ CHEMBL52004. (b-d) Types of inverse molecular designer. $X_i$, $Y_i$, and $z$ denote $i$-th molecular structure, properties of the $i$-th molecule, and latent code, respectively.
  • Figure 2: Overview of RL-guided combinatorial chemistry with BRICS. (a) Training process. (b) Modified BRICS degen2008BRICS fragment combination rules. Here, the RDKit rdkit version 2020.09.1.0 of the modified BRICS rules is adopted. This figure is modified from Degen et al., 2008, ChemMedChem, 10(3): 1503-1507 degen2008BRICS, with permission of Wiley-VCH GmbH. (c) Type of tasks. Task type A is to discover molecules that hit the specific values of given target properties and Task type B is to discover molecules that maximize the given target properties. (d) Fragment set $B$. Here, $B\cup\{end\}$ is defined as action space.
  • Figure 3: Inference process for molecular generation. (a) An example of a molecular generation process. (b) Property changes for generated molecules.
  • Figure 4: Targets for materials extrapolation. The PubChem SARS-CoV-2 clinical trials dataset PubChemCovid19 is more widely distributed than the ChEBML training dataset mendez2019chembl. The properties of five molecules in PubChem SARS-CoV clinical trials that deviated from the logP-TPSA distribution of the ChEMBL training dataset were set as extrapolation targets C1 to C5, and the properties of five molecules that deviated from the TPSA-QED distribution were set as extrapolation targets C6 to C10.
  • Figure 5: Quality benchmarks of generated molecules in materials extrapolation. Number of chemically valid molecules, 5 target-hitting molecules (logP, TPSA, QED, HBA, and HBD), and 7 target-hitting molecules(logP, TPSA, QED, HBA, HBD, MW, and DRD2)
  • ...and 3 more figures