Table of Contents
Fetching ...

Saturation-Based Atom Provenance Tracing in Chemical Reaction Networks

Marcel Friedrichs, Daniel Merkle

TL;DR

This work tackles the combinatorial explosion in atom tracing within chemical reaction networks by introducing a saturation-based framework that operates directly on atom–atom maps. It uses Kleisli morphisms over the powerset monad to propagate atom provenance and exhaustively enumerate all labeled configurations, represented as a template-instance hypergraph. The approach supports multiplicities, reuse, and synchronized multi-educt transformations, and enables focused projections to target compounds or isotopes. Demonstrations on the TCA cycle and glycolysis show automatic reproduction of known labeling patterns and discovery of steady-state labeling behavior, with potential applications in experiment design and beyond-chemistry domains that feature compositional mappings.

Abstract

Atom tracing is essential for understanding the fate of labeled atoms in biochemical reaction networks, yet existing computational methods either simplify label correlations or suffer from combinatorial explosion. We introduce a saturation-based framework for enumerating labeling patterns that directly operates on atom-atom maps without requiring flux data or experimental measurements. The approach models reaction semantics using Kleisli morphisms in the powerset monad, allowing for compositional propagation of atom provenance through reaction networks. By iteratively saturating all possible educt combinations of reaction rules, the method exhaustively enumerates labeled molecular configurations, including multiplicities and reuse. Allowing arbitrary initial labeling patterns - including identical or distinct labels - the method expands only isotopomers reachable from these inputs, keeping the configuration space as small as necessary and avoids the full combinatorial growth characteristic of previous approaches. In principle, even every atom could carry a distinct identifier (e.g., tracing all carbon atoms individually), illustrating the generality of the framework beyond practical experimental limitations. The resulting template instance hypergraph captures the complete flow of atoms between compounds and supports projections tailored to experimental targets. Customizable labeling sets significantly reduce generated network sizes, providing efficient and exact atom traces focused on specific compounds or available isotopes. Applications to the tricarboxylic acid cycle, and glycolytic pathways demonstrate that the method fully automatically reproduces known labeling patterns and discovers steady-state labeling behavior. The framework offers a scalable, mechanistically transparent, and generalizable foundation for isotopomer modeling and experiment design.

Saturation-Based Atom Provenance Tracing in Chemical Reaction Networks

TL;DR

This work tackles the combinatorial explosion in atom tracing within chemical reaction networks by introducing a saturation-based framework that operates directly on atom–atom maps. It uses Kleisli morphisms over the powerset monad to propagate atom provenance and exhaustively enumerate all labeled configurations, represented as a template-instance hypergraph. The approach supports multiplicities, reuse, and synchronized multi-educt transformations, and enables focused projections to target compounds or isotopes. Demonstrations on the TCA cycle and glycolysis show automatic reproduction of known labeling patterns and discovery of steady-state labeling behavior, with potential applications in experiment design and beyond-chemistry domains that feature compositional mappings.

Abstract

Atom tracing is essential for understanding the fate of labeled atoms in biochemical reaction networks, yet existing computational methods either simplify label correlations or suffer from combinatorial explosion. We introduce a saturation-based framework for enumerating labeling patterns that directly operates on atom-atom maps without requiring flux data or experimental measurements. The approach models reaction semantics using Kleisli morphisms in the powerset monad, allowing for compositional propagation of atom provenance through reaction networks. By iteratively saturating all possible educt combinations of reaction rules, the method exhaustively enumerates labeled molecular configurations, including multiplicities and reuse. Allowing arbitrary initial labeling patterns - including identical or distinct labels - the method expands only isotopomers reachable from these inputs, keeping the configuration space as small as necessary and avoids the full combinatorial growth characteristic of previous approaches. In principle, even every atom could carry a distinct identifier (e.g., tracing all carbon atoms individually), illustrating the generality of the framework beyond practical experimental limitations. The resulting template instance hypergraph captures the complete flow of atoms between compounds and supports projections tailored to experimental targets. Customizable labeling sets significantly reduce generated network sizes, providing efficient and exact atom traces focused on specific compounds or available isotopes. Applications to the tricarboxylic acid cycle, and glycolytic pathways demonstrate that the method fully automatically reproduces known labeling patterns and discovers steady-state labeling behavior. The framework offers a scalable, mechanistically transparent, and generalizable foundation for isotopomer modeling and experiment design.

Paper Structure

This paper contains 19 sections, 25 equations, 10 figures.

Figures (10)

  • Figure 1: Left: Chemical reaction network CRN hypergraph of a formaldehyde toy network generated by a graph rewriting system from a set of reaction rules. Carbon atoms of all compounds are assigned unique position IDs in blue. Right: Carbon atom-atom maps of all reactions and symmetries labeled $s0$ to $s6$. Three pairs are symmetries of each other: $(s1, s2)$, $(s3, s4)$, and $(s5, s6)$.
  • Figure 2: Unique carbon atom position IDs in blue for formaldehyde, glycolaldehyde, 1,2-ethendiol, 2,3,4-trihydroxybutanal, and glyceraldehyde. Their representation as compound templates is $(1)$, $(2, 3)$, $(4, 5)$, $(6, 7, 8, 9)$, and $(10, 11, 12)$.
  • Figure 3: A: Glucose with position IDs assigned to carbon atoms in blue. B: Glucose-1-$^{13}C$-2-$^{14}C$. C: Glucose-U-$^{13}C_6$.
  • Figure 4: Five reaction rules from the formaldehyde toy reaction network showing the mapping of educt to product position IDs via slot indices. The slot indices are shown above the product position IDs. Dashed lines represent the mapping from educt position IDs to slot indices. A: $s2$, B: $s5$, C: $reverse(s3)$, D: $s0$, and E: $reverse(s0)$.
  • Figure 5: Full saturation graph for the formaldehyde toy network. Each vertex represents a labeled template instance as a table of the compounds name, position and origin IDs, as well as the chemical structure with origin IDs assigned to the carbons. Edge labels represent the reactions that produced this template instance.
  • ...and 5 more figures