Table of Contents
Fetching ...

Application of the Digital Annealer Unit in Optimizing Chemical Reaction Conditions for Enhanced Production Yields

Shih-Cheng Li, Pei-Hwa Wang, Jheng-Wei Su, Wei-Yin Chiang, Shih-Hsien Huang, Yen-Chu Lin, Chia-Ho Ou, Chih-Yu Chen

TL;DR

This paper addresses the challenge of optimizing chemical reaction conditions across vast chemical spaces by formulating the problem as a QUBO and leveraging a Digital Annealing Unit (DAU) for fast inference. It compares two QUBO implementations—an ML-based model that learns the Q matrix and a DAU-based model that directly encodes conditions into binary variables—and evaluates them on multiple HTE and Reaxys datasets. Results show that both approaches achieve accuracy comparable to classical baselines while delivering orders-of-magnitude speedups in inference, enabling rapid screening of billions of condition combinations and effective active-learning campaigns. The work demonstrates a promising hybrid workflow where ML training of the Q matrix is coupled with DAU-driven inference to accelerate iterative design of reaction conditions in chemical synthesis.

Abstract

Finding appropriate reaction conditions that yield high product rates in chemical synthesis is crucial for the chemical and pharmaceutical industries. However, due to the vast chemical space, conducting experiments for each possible reaction condition is impractical. Consequently, models such as QSAR (Quantitative Structure-Activity Relationship) or ML (Machine Learning) have been developed to predict the outcomes of reactions and illustrate how reaction conditions affect product yield. Despite these advancements, inferring all possible combinations remains computationally prohibitive when using a conventional CPU. In this work, we explore using a Digital Annealing Unit (DAU) to tackle these large-scale optimization problems more efficiently by solving Quadratic Unconstrained Binary Optimization (QUBO). Two types of QUBO models are constructed in this work: one using quantum annealing and the other using ML. Both models are built and tested on four high-throughput experimentation (HTE) datasets and selected Reaxys datasets. Our results suggest that the performance of models is comparable to classical ML methods (i.e., Random Forest and Multilayer Perceptron (MLP)), while the inference time of our models requires only seconds with a DAU. Additionally, in campaigns involving active learning and autonomous design of reaction conditions to achieve higher reaction yield, our model demonstrates significant improvements by adding new data, showing promise of adopting our method in the iterative nature of such problem settings. Our method can also accelerate the screening of billions of reaction conditions, achieving speeds millions of times faster than traditional computing units in identifying superior conditions. Therefore, leveraging the DAU with our developed QUBO models has the potential to be a valuable tool for innovative chemical synthesis.

Application of the Digital Annealer Unit in Optimizing Chemical Reaction Conditions for Enhanced Production Yields

TL;DR

This paper addresses the challenge of optimizing chemical reaction conditions across vast chemical spaces by formulating the problem as a QUBO and leveraging a Digital Annealing Unit (DAU) for fast inference. It compares two QUBO implementations—an ML-based model that learns the Q matrix and a DAU-based model that directly encodes conditions into binary variables—and evaluates them on multiple HTE and Reaxys datasets. Results show that both approaches achieve accuracy comparable to classical baselines while delivering orders-of-magnitude speedups in inference, enabling rapid screening of billions of condition combinations and effective active-learning campaigns. The work demonstrates a promising hybrid workflow where ML training of the Q matrix is coupled with DAU-driven inference to accelerate iterative design of reaction conditions in chemical synthesis.

Abstract

Finding appropriate reaction conditions that yield high product rates in chemical synthesis is crucial for the chemical and pharmaceutical industries. However, due to the vast chemical space, conducting experiments for each possible reaction condition is impractical. Consequently, models such as QSAR (Quantitative Structure-Activity Relationship) or ML (Machine Learning) have been developed to predict the outcomes of reactions and illustrate how reaction conditions affect product yield. Despite these advancements, inferring all possible combinations remains computationally prohibitive when using a conventional CPU. In this work, we explore using a Digital Annealing Unit (DAU) to tackle these large-scale optimization problems more efficiently by solving Quadratic Unconstrained Binary Optimization (QUBO). Two types of QUBO models are constructed in this work: one using quantum annealing and the other using ML. Both models are built and tested on four high-throughput experimentation (HTE) datasets and selected Reaxys datasets. Our results suggest that the performance of models is comparable to classical ML methods (i.e., Random Forest and Multilayer Perceptron (MLP)), while the inference time of our models requires only seconds with a DAU. Additionally, in campaigns involving active learning and autonomous design of reaction conditions to achieve higher reaction yield, our model demonstrates significant improvements by adding new data, showing promise of adopting our method in the iterative nature of such problem settings. Our method can also accelerate the screening of billions of reaction conditions, achieving speeds millions of times faster than traditional computing units in identifying superior conditions. Therefore, leveraging the DAU with our developed QUBO models has the potential to be a valuable tool for innovative chemical synthesis.
Paper Structure (14 sections, 3 equations, 14 figures, 9 tables)

This paper contains 14 sections, 3 equations, 14 figures, 9 tables.

Figures (14)

  • Figure 1: Workflow of this work. The reaction data from HTE/Reaxys are encoded by one-hot encoding/fingerfrints. The $Q$ matrix is then obtained through an ML-based/DAU-based model. With the $Q$ matrix, one can use it to predict reaction yields from a given reaction information.
  • Figure 2: Schematic illustration of (a) subset/whole data training and (b) active training. In subset training I, the subset data is used for training and predicting the yield of the corresponding subset, while in subset training II, the model's extrapolation ability is verified by training on three subsets and testing on the remaining one. For active training in (b), 100 data points are randomly sampled from each HTE dataset initially. 50 data points are either randomly or strategically added to the training data iteratively.
  • Figure 3: The parity plots of experimental yields and the corresponding predicted yields from the (a) ML-based model and (b) DAU-based model on the test sets. These parity plots of the ML-based model were separately trained on s1, s3, s7, and s9 subsets of the C-N cross-coupling dataset, with one-hot encoding applied to the agents. The strong dependencies between the predicted and experimental yields show the efficacy of QUBO model in predicting reaction yields. (HTE datasets)
  • Figure 4: The results of the (a) ML-based models and (b) DAU-based models trained on the data of C-N cross-coupling, C-H arylation, amidation and deoxyfluorination in HTE dataset with one-hot encoding. The parity plots are obtained by evaluating the corresponding held-out test set in different data subsets. The correlation between predicted yield and experimenta (ground truth)) yield in each prediction shows the generality of learnability of the QUBO model.
  • Figure 5: Results obtained from applying active learning models to diverse HTE datasets (with one-hot encoding), utilizing strategic or random methods for adding new data points in subsequent runs. Each data point denotes the mean top-k score, computed over five folds. The results show that the accuracy of active learning is saturated around the 4th or 5th iteration, and the performance of strategic selection outperforms random selection in the ML-based model. Similarly, in the DAU-based model, the performance plateaus around iterations 7 or 8, and the strategic selection also shows an advantage on optimal condition finding.
  • ...and 9 more figures