Table of Contents
Fetching ...

Functional classification of metabolic networks

Jorge Reyes, Jörn Dunkel

TL;DR

To address functional classification of chemical reaction networks, the authors introduce a Grassmann-distance framework that compares the right nullspaces (steady-state fluxes) and left nullspaces (conservation laws) of stoichiometric matrices. Metabolic distances, computed as $d_{Gr(\infty,\infty)}(A,B) = \sqrt{|k-\ell| \pi^2/4 + \sum_{i=1}^{\min(k,\ell)} \theta_i^2}$, reveal functional relationships that diverge from phylogenetic similarity and cluster networks by shared metabolic processes. The framework is validated on the gut microbiome (mini-AGORA2 and E. coli knockouts), extended to human tissues, and illustrated on planetary atmospheres, demonstrating robust, cross-scale applicability. Together these results enable a principled, physically grounded atlas for comparing chemical reaction networks across biology and beyond.

Abstract

Chemical reaction networks underpin biological and physical phenomena across scales, from microbial interactions to planetary atmosphere dynamics. Bacterial communities exhibit complex competitive interactions for resources, human organs and tissues demonstrate specialized biochemical functions, and planetary atmospheres can display diverse organic and inorganic chemical processes. Despite their complexities, comparing these networks methodically remains a challenge due to the vast underlying degrees of freedom. In biological systems, comparative genomics has been pivotal in tracing evolutionary trajectories and classifying organisms via DNA sequences. However, purely genomic classifications often fail to capture functional roles within ecological systems. Metabolic changes driven by nutrient availability highlight the need for classification schemes that integrate metabolic information. Here we introduce and apply a computational framework for a classification scheme of organisms that compares matrix representations of chemical reaction networks using the Grassmann distance, corresponding to measuring distances between the nullspaces of stoichiometric matrices. Applying this framework to human gut microbiome data confirms that metabolic distances are distinct from phylogenetic distances, underscoring the limitations of genetic information in metabolic classification. Importantly, our analysis of metabolic distances reveals functional groups of organisms enriched or depleted in specific metabolic processes and shows robustness to metabolically silent genetic perturbations. The generalizability of metabolic Grassmann distances is illustrated by application to chemical reaction networks in human tissue and planetary atmospheres, highlighting its potential for advancing functional comparisons across diverse chemical reaction systems.

Functional classification of metabolic networks

TL;DR

To address functional classification of chemical reaction networks, the authors introduce a Grassmann-distance framework that compares the right nullspaces (steady-state fluxes) and left nullspaces (conservation laws) of stoichiometric matrices. Metabolic distances, computed as , reveal functional relationships that diverge from phylogenetic similarity and cluster networks by shared metabolic processes. The framework is validated on the gut microbiome (mini-AGORA2 and E. coli knockouts), extended to human tissues, and illustrated on planetary atmospheres, demonstrating robust, cross-scale applicability. Together these results enable a principled, physically grounded atlas for comparing chemical reaction networks across biology and beyond.

Abstract

Chemical reaction networks underpin biological and physical phenomena across scales, from microbial interactions to planetary atmosphere dynamics. Bacterial communities exhibit complex competitive interactions for resources, human organs and tissues demonstrate specialized biochemical functions, and planetary atmospheres can display diverse organic and inorganic chemical processes. Despite their complexities, comparing these networks methodically remains a challenge due to the vast underlying degrees of freedom. In biological systems, comparative genomics has been pivotal in tracing evolutionary trajectories and classifying organisms via DNA sequences. However, purely genomic classifications often fail to capture functional roles within ecological systems. Metabolic changes driven by nutrient availability highlight the need for classification schemes that integrate metabolic information. Here we introduce and apply a computational framework for a classification scheme of organisms that compares matrix representations of chemical reaction networks using the Grassmann distance, corresponding to measuring distances between the nullspaces of stoichiometric matrices. Applying this framework to human gut microbiome data confirms that metabolic distances are distinct from phylogenetic distances, underscoring the limitations of genetic information in metabolic classification. Importantly, our analysis of metabolic distances reveals functional groups of organisms enriched or depleted in specific metabolic processes and shows robustness to metabolically silent genetic perturbations. The generalizability of metabolic Grassmann distances is illustrated by application to chemical reaction networks in human tissue and planetary atmospheres, highlighting its potential for advancing functional comparisons across diverse chemical reaction systems.

Paper Structure

This paper contains 22 sections, 39 equations, 14 figures.

Figures (14)

  • Figure 1: Metabolic Grassmann distances are calculated by comparing nullspaces of stoichiometric matrices. Lists of chemical reactions and transport processes (a) are collected in graphs (b) where vertices and edges correspond to chemicals and processes, respectively. The tails and heads of an edge carry information about the number of chemicals consumed and produced by the process, accordingly. The graph representation in turn admits a matrix representation (c): the graph incidence or stoichiometric matrix whose entries are these weights up to a sign which captures whether a metabolite is consumed $(-)$ or produced $(+)$. (d) Row and column-sorted stoichiometric matrices are (e) transformed by computation of their right and left nullspaces---omitting rows and columns of full zeros which correspond to network-specific nonexistent metabolites and processes. (f) Networks are compared pairwise by applying the Grassmann distance metric (Eq. \ref{['eq:GRASSMANN']}) to obtain a distance matrix. Abbr: RNS = right nullspace, LNS = left nullspace.
  • Figure 2: Metabolic Grassmann clusters are the result of competition between the dimension gap and angular term of the distance metric which is robust to genetic knockout perturbation. (a) Metabolic Grassmann distance matrices are computed for organisms at different scales of genetic similarity: computationally viable Escherichia coli K-12 MG1665 in silico KOs (top), E. coli strains in mini-AGORA2 (middle), and all mini-AGORA2 organisms along with distribution of these distances. Distance distributions show preferences for larger distances with increasing genetic diversity. (b) Joint distance matrix for all organisms considered in (a). (c) Stochastic-gradient descent multidimensional (sgd-MDS) embeddings are shown for all organisms considered in (a) with appropriate color schemes where blue corresponds to non-E. coli organisms in mini-AGORA2. All distance matrices are sorted by hierarchical clustering with Ward linkage. Bacteria images were obtained and modified from Ref. LeMercier2022 under a Creative Commons Attribution 4.0 International License. (d) Squared Grassmann distances shown in (b) are decomposed as a dimension gap (left) and angular term (right) for both nullspaces. (e) The $\log_{10}$ fold difference between these components reveals that the dimension gap dominates across clusters while the angular term dominates within clusters.
  • Figure 3: In silico Escherichia coli KO viability is readily identified by the L-Grassmann distance. Viability of E. coli KOs are assessed by flux balance analysis (FBA) and minimization of metabolic adjustment (MOMA). A network is viable if it realizes a nonzero biomass flux in any flux distribution. We observe no differences in viability arising from the choice of FBA versus MOMA. Mean silhouette scores of the E. coli distance matrix, with viability as cluster assignment, reveals that the L-Grassmann distance on open networks best captures differences in KO viability.
  • Figure 4: Inequivalence of metabolic and phylogenetic metrics in mini-AGORA2 organisms. The Jaccard distances correlate the most with the phylogenetic distance which suggests that it is not the best choice for substantially distinguishing organisms beyond genetic differences. The line of best fit is shown in red with corresponding $R^2$-values on the bottom left. That the L-Grassmann distances appear quantized compared to the Jaccard metric suggest that metabolic network, despite having different metabolic processes, display similar conservation laws.
  • Figure 5: Euclidean embeddings of the metabolic distances suggest that organisms do not form distinct metabolic niches on the basis of phyla. Mini-AGORA2 phylogenetic and metabolic distance matrices are sorted by organism phylum for the five most abundant phyla: Actinobacteria, Bacteroidetes, Firmicutes, Fusobacteria, and Proteobacteria. We exclude three other phyla each with one network. Mean silhouette scores $\langle s \rangle$ are computed for each distance using phyla as cluster assignments. Multidimensional scaling embeddings in $\mathbb{R}^2$ show loss of adherence to these phyla assignments across all metabolic distance when compared to the phylogenetic distance.
  • ...and 9 more figures