Table of Contents
Fetching ...

Data-driven discovery of chemical reaction networks

Abraham Reyes-Velazquez, Stefan Güttel, Igor Larrosa, Jonas Latz

TL;DR

The paper addresses the challenge of automatically reconstructing full chemical reaction networks from concentration time-series data. It introduces a unified SINDy-based framework that uses an integral formulation of CRN dynamics and a convex post-processing step to map inferred terms to mass-action mechanisms, along with rigorous error analysis favoring the integral variant. The authors demonstrate, both theoretically and empirically, that integral SINDy yields superior noise robustness and more accurate network graph recovery across benchmark CRNs, including open networks requiring a zero-complex workaround. The work enables fully automated, data-driven chemical mechanism discovery and provides practical guidance for recovering CRNs from noisy experimental data.

Abstract

We propose a unified framework that allows for the full mechanistic reconstruction of chemical reaction networks (CRNs) from concentration data. The framework utilizes an integral formulation of the differential equations governing the chemical reactions, followed by an automatic procedure to recover admissible mass-action mechanisms from the equations. We provide theoretical justification for the use of integral formulations using analytical and numerical error bounds. The integral formulation is demonstrated to offer superior robustness to noise and improved accuracy in both rate-law and graph recovery when compared to other commonly used formulations. Together, our developments advance the goal of fully automated, data-driven chemical mechanism discovery.

Data-driven discovery of chemical reaction networks

TL;DR

The paper addresses the challenge of automatically reconstructing full chemical reaction networks from concentration time-series data. It introduces a unified SINDy-based framework that uses an integral formulation of CRN dynamics and a convex post-processing step to map inferred terms to mass-action mechanisms, along with rigorous error analysis favoring the integral variant. The authors demonstrate, both theoretically and empirically, that integral SINDy yields superior noise robustness and more accurate network graph recovery across benchmark CRNs, including open networks requiring a zero-complex workaround. The work enables fully automated, data-driven chemical mechanism discovery and provides practical guidance for recovering CRNs from noisy experimental data.

Abstract

We propose a unified framework that allows for the full mechanistic reconstruction of chemical reaction networks (CRNs) from concentration data. The framework utilizes an integral formulation of the differential equations governing the chemical reactions, followed by an automatic procedure to recover admissible mass-action mechanisms from the equations. We provide theoretical justification for the use of integral formulations using analytical and numerical error bounds. The integral formulation is demonstrated to offer superior robustness to noise and improved accuracy in both rate-law and graph recovery when compared to other commonly used formulations. Together, our developments advance the goal of fully automated, data-driven chemical mechanism discovery.
Paper Structure (25 sections, 4 theorems, 89 equations, 19 figures, 1 algorithm)

This paper contains 25 sections, 4 theorems, 89 equations, 19 figures, 1 algorithm.

Key Result

Theorem 1

Consider $M$ scalar four-times continuously differentiable time series $\{x_\alpha\}_{\alpha=1}^M \subset C^4([t_0,t_n])$. Measurements $x_\alpha(t_i) = x_{\alpha,i}$ are taken at equispaced times These observations are corrupted by i.i.d. bounded additive noise $\xi_{\alpha,i}$, independent across $\alpha$ and $i$. Let $\mathbf{X} = [ x_{\alpha,i} ] \in \mathbb{R}^{M \times (n+1)}$ denote the d

Figures (19)

  • Figure 1: Examples of CRNs being reversible, weakly reversible, and open
  • Figure 2: Example of an open chemical reaction network composed of a closed subgraph connected to multiple peripheral source and sink complexes. Each source feeds into, and each sink receives output from, the closed portion of the network.
  • Figure 3: Graph of the M1 mechanism
  • Figure 4: M1 reconstruction error for integration-based and differentiation-based recovery methods across increasing number of time points. Each faint line corresponds to one of 100 independent trials; bold lines represent the geometric mean of all realisations. In each subplot, we also include two dashed reference lines to assess the decay rate with respect to the number of time points. One line follows the theoretical error bound from Theorem \ref{['thm:2']} and the other was fitted to the observed numerical decay rate.
  • Figure 5: Left: M1 support mismatch between the ground-truth matrix $\mathbf{C}_{\mathrm{ex}}$ and recovered coefficient matrices over 1000 trials, for both differentiation and integration-based formulations, with varying number of time points. Right: Quality of the recovered Kirchhoff matrices.
  • ...and 14 more figures

Theorems & Definitions (6)

  • Theorem 1
  • Theorem 2
  • Corollary 3
  • Theorem 4
  • proof : Proof of Theorem \ref{['thm:1']}
  • proof : Proof of Theorem \ref{['cor:1']}