Table of Contents
Fetching ...

Scaffold-Based Multi-Objective Drug Candidate Optimization

Agustin Kruel, Andrew D. McNaughton, Neeraj Kumar

TL;DR

The paper addresses the need to balance multiple drug-like properties in scaffold-constrained drug design. It proposes ScaMARS, a scaffold-focused graph-based Markov chain Monte Carlo framework that uses an MPNN prior and a flexible desirability score to navigate chemical space around a starting scaffold. Results show high final-solution quality with a 99.5% success rate and 84.6% diversity, outperforming the conditional molGCT in multi-parameter optimization. This approach enhances adaptability and efficiency for discovering scaffold-consistent drug candidates while providing insights into how scaffold choice and fragment vocabulary shape exploration.

Abstract

In therapeutic design, balancing various physiochemical properties is crucial for molecule development, similar to how Multiparameter Optimization (MPO) evaluates multiple variables to meet a primary goal. While many molecular features can now be predicted using \textit{in silico} methods, aiding early drug development, the vast data generated from high throughput virtual screening challenges the practicality of traditional MPO approaches. Addressing this, we introduce a scaffold focused graph-based Markov chain Monte Carlo framework (ScaMARS) built to generate molecules with optimal properties. This innovative framework is capable of self-training and handling a wider array of properties, sampling different chemical spaces according to the starting scaffold. The benchmark analysis on several properties shows that ScaMARS has a diversity score of 84.6\% and has a much higher success rate of 99.5\% compared to conditional models. The integration of new features into MPO significantly enhances its adaptability and effectiveness in therapeutic design, facilitating the discovery of candidates that efficiently optimize multiple properties.

Scaffold-Based Multi-Objective Drug Candidate Optimization

TL;DR

The paper addresses the need to balance multiple drug-like properties in scaffold-constrained drug design. It proposes ScaMARS, a scaffold-focused graph-based Markov chain Monte Carlo framework that uses an MPNN prior and a flexible desirability score to navigate chemical space around a starting scaffold. Results show high final-solution quality with a 99.5% success rate and 84.6% diversity, outperforming the conditional molGCT in multi-parameter optimization. This approach enhances adaptability and efficiency for discovering scaffold-consistent drug candidates while providing insights into how scaffold choice and fragment vocabulary shape exploration.

Abstract

In therapeutic design, balancing various physiochemical properties is crucial for molecule development, similar to how Multiparameter Optimization (MPO) evaluates multiple variables to meet a primary goal. While many molecular features can now be predicted using \textit{in silico} methods, aiding early drug development, the vast data generated from high throughput virtual screening challenges the practicality of traditional MPO approaches. Addressing this, we introduce a scaffold focused graph-based Markov chain Monte Carlo framework (ScaMARS) built to generate molecules with optimal properties. This innovative framework is capable of self-training and handling a wider array of properties, sampling different chemical spaces according to the starting scaffold. The benchmark analysis on several properties shows that ScaMARS has a diversity score of 84.6\% and has a much higher success rate of 99.5\% compared to conditional models. The integration of new features into MPO significantly enhances its adaptability and effectiveness in therapeutic design, facilitating the discovery of candidates that efficiently optimize multiple properties.
Paper Structure (10 sections, 2 equations, 5 figures, 1 table)

This paper contains 10 sections, 2 equations, 5 figures, 1 table.

Figures (5)

  • Figure 1: (a) ScaMARS workflow for proposing a new molecule generation. First, the initial scaffold or molecule from the previous generation is fed into the MPNN as the 'Prior'. A new, 'Proposal' molecule is then proposed through edits (addition or subtraction of atom groups) by the MPNN. The MPO scores of both Prior and Proposal molecules are used in the annealed MCMC to choose whether the model accepts the proposal. If so, the Proposal is added to the generation and the cycle repeats. If not, the Prior molecule is kept unchanged for the next generation. Once it reaches the desired number of molecules, MPNN loss is calculated on the success of the entire generation to favor beneficial edits. (b) The workflow of a conditional model, specifically molGCT, for comparison.
  • Figure 2: Two embedding dimensions of a Pairwise Controlled Manifold Approximation (PaCMAP) using all 134,588 unique molecules produced during a ScaMARS run. 6 features were optimized (QED, TPSA, cLogP, nRotat, fCsp3, SA) starting from S-Adenosyl methionine (SAM) circled with red and illustrated on the right. Arrows track each modification this molecule underwent at each generation. Circled with red and on the left is the final molecule that was generated for one of the 1,000 paths.
  • Figure 3: Closer look and analysis of the 1,852 compound cluster housing the original starting SAM scaffold. The red line follows the medoid of the cluster to its point on the graph. Heatmap shading on the medoid compound correspond to each atom's contribution to the compound's average average Tanimoto similarity towards the rest of the cluster, with darker red being a greater loss to similarity through the removal of that atom. Labelled with a red 'X' on the graph and visualized to the left of the medoid is the highest-scoring compound in this cluster with a score of 0.694.
  • Figure 4: PaCMAP of final molecules produced by either model embedded onto the previously calculated 134,588 molecule, 2048-dimensional space. (a) ScaMARS molecules colored by score. (b) molGCT molecules.
  • Figure 5: Possible synthetic route proposed by ASKCOS for the highest-scoring SAM analogue generated through ScaMARS.