Table of Contents
Fetching ...

Finding Pathways in Reaction Networks guided by Energy Barriers using Integer Linear Programming

Adittya Pal, Rolf Fagerberg, Jakob Lykke Andersen, Peter Dittrich, Daniel Merkle

TL;DR

This work addresses the challenge of finding kinetically plausible synthesis pathways in large reaction networks by modeling the network as a directed hypergraph and formulating pathway search as an integer linear program over integer hyperflows. A linear, physically motivated objective minimizes ∑_{e∈E} f_e (G_e + RT log D) to maximize the overall pathway probability, where G_e are reaction barriers and D = ∑_{i∈E} exp(−G_i/(RT)). The authors introduce an automated pipeline that estimates energy barriers using OpenBabel, xTB, RDKit, ASE NeuralNEB, and Nudged Elastic Band, enabling kinetic annotation of generative networks. The method is demonstrated on a glycolonitrile–NH_3–H_2O network expanded to 44 vertices and 116 hyperedges, yielding multiple high-ranking, structurally distinct pathways to glycine and glycolic acid and illustrating scalability and practicality for large networks.

Abstract

Analyzing synthesis pathways for target molecules in a chemical reaction network annotated with information on the kinetics of individual reactions is an area of active study. This work presents a computational methodology for searching for pathways in reaction networks which is based on integer linear programming and the modeling of reaction networks by directed hypergraphs. Often multiple pathways fit the given search criteria. To rank them, we develop an objective function based on physical arguments maximizing the probability of the pathway. We furthermore develop an automated pipeline to estimate the energy barriers of individual reactions in reaction networks. Combined, the methodology facilitates flexible and kinetically informed pathway investigations on large reaction networks by computational means, even for networks coming without kinetic annotation, such as those created via generative approaches for expanding molecular spaces.

Finding Pathways in Reaction Networks guided by Energy Barriers using Integer Linear Programming

TL;DR

This work addresses the challenge of finding kinetically plausible synthesis pathways in large reaction networks by modeling the network as a directed hypergraph and formulating pathway search as an integer linear program over integer hyperflows. A linear, physically motivated objective minimizes ∑_{e∈E} f_e (G_e + RT log D) to maximize the overall pathway probability, where G_e are reaction barriers and D = ∑_{i∈E} exp(−G_i/(RT)). The authors introduce an automated pipeline that estimates energy barriers using OpenBabel, xTB, RDKit, ASE NeuralNEB, and Nudged Elastic Band, enabling kinetic annotation of generative networks. The method is demonstrated on a glycolonitrile–NH_3–H_2O network expanded to 44 vertices and 116 hyperedges, yielding multiple high-ranking, structurally distinct pathways to glycine and glycolic acid and illustrating scalability and practicality for large networks.

Abstract

Analyzing synthesis pathways for target molecules in a chemical reaction network annotated with information on the kinetics of individual reactions is an area of active study. This work presents a computational methodology for searching for pathways in reaction networks which is based on integer linear programming and the modeling of reaction networks by directed hypergraphs. Often multiple pathways fit the given search criteria. To rank them, we develop an objective function based on physical arguments maximizing the probability of the pathway. We furthermore develop an automated pipeline to estimate the energy barriers of individual reactions in reaction networks. Combined, the methodology facilitates flexible and kinetically informed pathway investigations on large reaction networks by computational means, even for networks coming without kinetic annotation, such as those created via generative approaches for expanding molecular spaces.

Paper Structure

This paper contains 18 sections, 13 equations, 7 figures, 5 tables.

Figures (7)

  • Figure 1: A directed hypergraph (top-left), where circles are vertices and the squares are hyperedges. The other three figures show possible integer hyperflows in the hypergraph, each representing a pathway, where hyperedges with non-zero flow are drawn in bold. The inflow and the outflow are depicted as arrows in and out of the source and target vertices of the pathway. Hyperedges with multiple copies of a vertex as source or target are depicted with parallel arrows. See for instance $e_2$, which has $v_3$ twice as a target and thus represents the reaction $v_0\rightarrow 2 v_3$.
  • Figure 2: Top: A schematic depiction of the five best pathways to glyoxlic acid according to the chosen scoring function (energy levels not shown exactly to scale). The structures for the molecules in the corresponding vertices is listed in Table \ref{['tab:molecular_structure']} and the weights assigned to the hyperedges listed in Table \ref{['tab:pathways1']}. Bottom: The energy profile of the best-scoring pathway (in blue) from above (energy levels shown to scale).
  • Figure 3: The six best pathways to glyoxlic acid according to the chosen scoring function. The structures for the molecules for the vertices is listed in Table \ref{['tab:molecular_structure']} and the weights assigned to the hyperedges listed in Table \ref{['tab:pathways2']}.
  • Figure 4: Rule 1: Addition of water or ammonia to hydrogen cyanide (nitrile group) where $X \in \{\mathrm{N}, \mathrm{O}\}$ to form an imine. The reverse of this rule was used while expanding the hypergraph.
  • Figure 5: Rule 2: Addition of water or ammonia to a double bond where $X, Y \in \{\mathrm{N}, \mathrm{O}\}$ (in an imine or carbonyl group respectively). The reverse of this rule was also used while expanding the hypergraph.
  • ...and 2 more figures