Retro-BLEU: Quantifying Chemical Plausibility of Retrosynthesis Routes through Reaction Template Sequence Analysis
Junren Li, Lei Fang, Jian-Guang Lou
TL;DR
Retrosynthesis route plausibility is hard to quantify with traditional metrics. This work introduces Retro-BLEU, a statistics-based metric that adapts the BLEU idea to reaction-template sequences, combining a length penalty with bigram overlap against known routes: $Score_{\rm Retro-BLEU}(r) = \exp\left(\frac{L}{\max(L,\mathrm{len}(r))}\right) + \exp\left(f_{2}(r)\right)$ with $n=2$ and $L \approx 3$. Empirically, Retro-BLEU differentiates patent-derived routes from model-generated ones and outperforms baselines in top-k ranking, with template-bigram analysis providing deeper insights into plausible versus unproductive steps, though performance is limited by observed-template coverage and reaction-space sparsity. The method offers a practical screening tool to accelerate CASP-based route selection and motivates ongoing template-library updates and incorporation of additional feasibility factors such as yields and costs.
Abstract
Computer-assisted methods have emerged as valuable tools for retrosynthesis analysis. However, quantifying the plausibility of generated retrosynthesis routes remains a challenging task. We introduce Retro-BLEU, a statistical metric adapted from the well-established BLEU score in machine translation, to evaluate the plausibility of retrosynthesis routes based on reaction template sequences analysis. We demonstrate the effectiveness of Retro-BLEU by applying it to a diverse set of retrosynthesis routes generated by state-of-the-art algorithms and compare the performance with other evaluation metrics. The results show that Retro-BLEU is capable of differentiating between plausible and implausible routes. Furthermore, we provide insights into the strengths and weaknesses of Retro-BLEU, paving the way for future developments and improvements in this field.
