Table of Contents
Fetching ...

FSscore: A Machine Learning-based Synthetic Feasibility Score Leveraging Human Expertise

Rebecca M. Neeser, Bruno Correia, Philippe Schwaller

TL;DR

The Focused Synthesizability score~(FSscore), which uses machine learning to rank structures based on their relative ease of synthesis, and showcases how a human-in-the-loop framework can be utilized to optimize the assessment of synthetic feasibility for various chemical applications.

Abstract

Determining whether a molecule can be synthesized is crucial in chemistry and drug discovery, as it guides experimental prioritization and molecule ranking in de novo design tasks. Existing scoring approaches to assess synthetic feasibility struggle to extrapolate to new chemical spaces or fail to discriminate based on subtle differences such as chirality. This work addresses these limitations by introducing the Focused Synthesizability score~(FSscore), which uses machine learning to rank structures based on their relative ease of synthesis. First, a baseline trained on an extensive set of reactant-product pairs is established, which is then refined with expert human feedback tailored to specific chemical spaces. This targeted fine-tuning improves performance on these chemical scopes, enabling more accurate differentiation between molecules that are hard and easy to synthesize. The FSscore showcases how a human-in-the-loop framework can be utilized to optimize the assessment of synthetic feasibility for various chemical applications.

FSscore: A Machine Learning-based Synthetic Feasibility Score Leveraging Human Expertise

TL;DR

The Focused Synthesizability score~(FSscore), which uses machine learning to rank structures based on their relative ease of synthesis, and showcases how a human-in-the-loop framework can be utilized to optimize the assessment of synthetic feasibility for various chemical applications.

Abstract

Determining whether a molecule can be synthesized is crucial in chemistry and drug discovery, as it guides experimental prioritization and molecule ranking in de novo design tasks. Existing scoring approaches to assess synthetic feasibility struggle to extrapolate to new chemical spaces or fail to discriminate based on subtle differences such as chirality. This work addresses these limitations by introducing the Focused Synthesizability score~(FSscore), which uses machine learning to rank structures based on their relative ease of synthesis. First, a baseline trained on an extensive set of reactant-product pairs is established, which is then refined with expert human feedback tailored to specific chemical spaces. This targeted fine-tuning improves performance on these chemical scopes, enabling more accurate differentiation between molecules that are hard and easy to synthesize. The FSscore showcases how a human-in-the-loop framework can be utilized to optimize the assessment of synthetic feasibility for various chemical applications.
Paper Structure (6 sections, 8 figures, 1 table)

This paper contains 6 sections, 8 figures, 1 table.

Figures (8)

  • Figure 1: Schematic overview of the data sources, architecture and training pipeline. For pre-training, data is extracted from reaction datasets, while fine-tuning is with a custom chemical scope of interest that can be labeled in various ways. Training the FSscore requires two forward passes for both molecules in the binary preference pair, while at inference time, one forward pass is sufficient to obtain the score.
  • Figure 2: Results showcasing the ability to differentiate molecules originating from MOSES polykovskiy_molecular_2020 from those in COCONUT sorokina_coconut_2021. The latter are expected to be more complex being natural products. The ROC curves in Figure \ref{['subfig:drugs_ROC']} detail the power to discriminate MOSES polykovskiy_molecular_2020 from COCONUT sorokina_coconut_2021. The arrows in the distribution plot (Fig. \ref{['subfig:drugs_dist']}) indicate the direction of higher synthetic feasibility.
  • Figure 3: Distributions showing the ability to differentiate molecules with assigned tetrahedral chirality from their unassigned counterpart. The desired prediction would score the assigned molecules as more complex resulting in negative delta values (assigned - unassigned) in Figure \ref{['subfig:chiral_delta']}.
  • Figure 4: ROC curves showcasing the classification power of the various models to separate hard (HS) from easy (ES) to synthesize in the CP test set.
  • Figure 5: FScore difference between full PROTAC and most complex respective fragment (either of the two ligands or linker).
  • ...and 3 more figures