Table of Contents
Fetching ...

Accelerating the inference of string generation-based chemical reaction models for industrial applications

Mikhail Andronov, Natalia Andronova, Michael Wand, Jürgen Schmidhuber, Djork-Arné Clevert

TL;DR

The paper tackles slow inference in template-free SMILES-to-SMILES transformers used for reaction product prediction and single-step retrosynthesis. It introduces speculative decoding, a draft-and-verify approach adapted from LLM inference, to SMILES generation by copying source subsequences into the output. Re-implementing the Molecular Transformer in PyTorch Lightning, the authors demonstrate more than threefold speedups on USPTO MIT and USPTO 50K while preserving top-k accuracy. The results show meaningful practical gains for industrial computer-aided synthesis planning, with limitations tied to batch size and beam width; the work includes code release and outlines directions for improved drafting strategies to further boost throughput.

Abstract

Template-free SMILES-to-SMILES translation models for reaction prediction and single-step retrosynthesis are of interest for industrial applications in computer-aided synthesis planning systems due to their state-of-the-art accuracy. However, they suffer from slow inference speed. We present a method to accelerate inference in autoregressive SMILES generators through speculative decoding by copying query string subsequences into target strings in the right places. We apply our method to the molecular transformer implemented in Pytorch Lightning and achieve over 3X faster inference in reaction prediction and single-step retrosynthesis, with no loss in accuracy.

Accelerating the inference of string generation-based chemical reaction models for industrial applications

TL;DR

The paper tackles slow inference in template-free SMILES-to-SMILES transformers used for reaction product prediction and single-step retrosynthesis. It introduces speculative decoding, a draft-and-verify approach adapted from LLM inference, to SMILES generation by copying source subsequences into the output. Re-implementing the Molecular Transformer in PyTorch Lightning, the authors demonstrate more than threefold speedups on USPTO MIT and USPTO 50K while preserving top-k accuracy. The results show meaningful practical gains for industrial computer-aided synthesis planning, with limitations tied to batch size and beam width; the work includes code release and outlines directions for improved drafting strategies to further boost throughput.

Abstract

Template-free SMILES-to-SMILES translation models for reaction prediction and single-step retrosynthesis are of interest for industrial applications in computer-aided synthesis planning systems due to their state-of-the-art accuracy. However, they suffer from slow inference speed. We present a method to accelerate inference in autoregressive SMILES generators through speculative decoding by copying query string subsequences into target strings in the right places. We apply our method to the molecular transformer implemented in Pytorch Lightning and achieve over 3X faster inference in reaction prediction and single-step retrosynthesis, with no loss in accuracy.
Paper Structure (12 sections, 3 figures, 4 tables, 1 algorithm)

This paper contains 12 sections, 3 figures, 4 tables, 1 algorithm.

Figures (3)

  • Figure 1: Both reaction product prediction and single-step retrosynthesis can be formulated as SMILES-to-SMILES translation and approached with a model like an encoder-decoder transformer.
  • Figure 2: Speculative decoding accelerates product prediction with the molecular transformer or a similar autoregressive SMILES generator. Before generating an output sequence, we prepare a list of subsequences of a desired length, e.g., four, of the tokenized query SMILES of reactants. Then, at every generation step, the model can copy up to four tokens from one of the draft sequences to the output, thus generating from one to five tokens in one forward pass.
  • Figure 3: An example of two first iterations of the sampling of candidate sequences in speculative beam search. Here, we select the two best candidates at each iteration. The first forward pass generates 12 candidate sequences. The second forward pass generates 24 sequences. The draft length in this example is 10. The best sequences in the first iteration are c1c[nH] and c1cn(C(=O)O. The best sequences after the second iteration are c1c[nH]c2ccc(C(C)= and c1cn(C(=O)OC(C)(C)C)c2.