Table of Contents
Fetching ...

Commutative Algebra Modeling in Materials Science -- A Case Study on Metal-Organic Frameworks (MOFs)

Caleb Simiyu Khaemba, Hongsong Feng, Dong Chen, Chun-Long Chen, Guo-Wei Wei

TL;DR

This work introduces category-specific commutative algebra (CSCA) as the first algebraic modeling framework for MOFs, translating multi-scale chemical connectivity into persistent facet ideals and f-vector descriptors. By partitioning atoms into chemically meaningful categories and constructing alpha-filtrations, CSCA yields category-aware, fixed-length embeddings that feed a gradient-boosting learner to predict MOF adsorption properties. The approach achieves competitive accuracy across four properties while enhancing interpretability through explicit algebraic and combinatorial descriptors tied to chemical categories. The method offers a rigorous, generalizable paradigm for structure–property relationships in porous materials and a new avenue for data-efficient, interpretable discovery in materials science.

Abstract

Metal-organic frameworks (MOFs) are a class of important crystalline and highly porous materials whose hierarchical geometry and chemistry hinder interpretable predictions in materials properties. Commutative algebra is a branch of abstract algebra that has been rarely applied in data and material sciences. We introduce the first ever commutative algebra modeling and prediction in materials science. Specifically, category-specific commutative algebra (CSCA) is proposed as a new framework for MOF representation and learning. It integrates element-based categorization with multiscale algebraic invariants to encode both local coordination motifs and global network organization of MOFs. These algebraically consistent, chemically aware representations enable compact, interpretable, and data efficient modeling of MOF properties such as Henry's constants and uptake capacities for common gases. Compared to traditional geometric and graph-based approaches, CSCA achieves comparable or superior predictive accuracy while substantially improving interpretability and stability across data sets. By aligning commutative algebra with the chemical hierarchy, the CSCA establishes a rigorous and generalizable paradigm for understanding structure and property relationships in porous materials and provides a nonlinear algebra-based framework for data-driven material discovery.

Commutative Algebra Modeling in Materials Science -- A Case Study on Metal-Organic Frameworks (MOFs)

TL;DR

This work introduces category-specific commutative algebra (CSCA) as the first algebraic modeling framework for MOFs, translating multi-scale chemical connectivity into persistent facet ideals and f-vector descriptors. By partitioning atoms into chemically meaningful categories and constructing alpha-filtrations, CSCA yields category-aware, fixed-length embeddings that feed a gradient-boosting learner to predict MOF adsorption properties. The approach achieves competitive accuracy across four properties while enhancing interpretability through explicit algebraic and combinatorial descriptors tied to chemical categories. The method offers a rigorous, generalizable paradigm for structure–property relationships in porous materials and a new avenue for data-efficient, interpretable discovery in materials science.

Abstract

Metal-organic frameworks (MOFs) are a class of important crystalline and highly porous materials whose hierarchical geometry and chemistry hinder interpretable predictions in materials properties. Commutative algebra is a branch of abstract algebra that has been rarely applied in data and material sciences. We introduce the first ever commutative algebra modeling and prediction in materials science. Specifically, category-specific commutative algebra (CSCA) is proposed as a new framework for MOF representation and learning. It integrates element-based categorization with multiscale algebraic invariants to encode both local coordination motifs and global network organization of MOFs. These algebraically consistent, chemically aware representations enable compact, interpretable, and data efficient modeling of MOF properties such as Henry's constants and uptake capacities for common gases. Compared to traditional geometric and graph-based approaches, CSCA achieves comparable or superior predictive accuracy while substantially improving interpretability and stability across data sets. By aligning commutative algebra with the chemical hierarchy, the CSCA establishes a rigorous and generalizable paradigm for understanding structure and property relationships in porous materials and provides a nonlinear algebra-based framework for data-driven material discovery.

Paper Structure

This paper contains 23 sections, 28 equations, 10 figures, 5 tables.

Figures (10)

  • Figure 1: Category-Specific Commutative Algebra (CSCA) pipeline for MOF property prediction. (a) Starting point: each MOF structure is uniformly rescaled to a cubic supercell of side $64\,\text{\AA}$. (b) Category simplicial complex construction: for each atom group $C_i \in \{C_a,\dots,C_h,C_{\text{all}}\}$, atoms of that category are extracted and used to form the simplicial complex $K(C_i)$ using the $\alpha$-complex filtration. (c) Algebraic curve generation: along the filtration parameter $\alpha$, we compute two types of descriptors: facet interval curves $FI_r(K(C_i);\alpha)$ for dimensions $r=0,1$ (capturing connected components and loops) and $f$-vector curves $f_s(K(C_i);\alpha)$ for dimensions $s=1,2,3$ (capturing counts of edges, triangles, and tetrahedra). (d) Feature vectorization: summary statistics of these curves are concatenated across all categories and dimensions into a unified feature representation. (e) Model prediction: the resulting feature vectors are used as input to a gradient boosting model, which predicts adsorption and transport properties of MOFs. This stepwise framework connects MOF structure to predictive machine learning via CSCA.
  • Figure 2: Top five most frequent elements within each compositional category $C_a, \dots, C_h$ in the CoRE MOF 2019 dataset. Each bar represents the number of frameworks containing a given element. The distribution reveals prevalent elements in each chemical group, such as alkali and alkaline earth metals in $C_a$, transition metals in $C_b$, and light nonmetals (C, N, O, H) in $C_e, \dots, C_h$. These trends form the compositional backbone of the CSCA representation, linking chemical diversity to category-specific algebraic features used for adsorption property prediction.
  • Figure 3: Element fraction ridgeline plots across four MOF property datasets. For each element, two distributions are shown: filled line $=$ top $25\%$ (MOFs with high property values) and dashed line $=$ bottom $25\%$ (MOFs with low property values). A rightward shift of the filled curve indicates enrichment in high-value MOFs. Observed trends: C and H are enriched; N shows mild enrichment; F, Cl, Cu, and Zn exhibit little or no shift.
  • Figure 4: Predicted versus true values for the four property datasets: (a) Henry's constants for $\mathrm{N}_2$; (b) Henry's constants for $\mathrm{O}_2$; (c) uptake capacities for $\mathrm{N}_2$; and (d) uptake capacities for $\mathrm{O}_2$. Each point represents a single MOF sample, where the predicted value $\hat{y}_i$ is obtained from the trained category-specific commutative algebra (CSCA) model using the corresponding feature vector of that sample. Points are colored according to their dominant atomic category ($C_a,\dots, C_h$ and $C_{\mathrm{all}}$), determined from atomic composition as defined in Table \ref{['main:tab:3']}. All MOF samples are predicted by the same model; colors therefore indicate the primary chemical group to which each sample belongs rather than different feature types or separate models. Each panel reports two evaluation metrics in the upper-left corner: the mean absolute error (MAE) and the coefficient of determination ($R^2$), where $R^2$ measures the proportion of variance in the true values $y_i$ explained by the predicted values $\hat{y}_i$.
  • Figure 5: (a). t-SNE of category-specific commutative algebra features. Background points show the dataset; colored markers highlight the maxima and minima for the four properties. We computed two dimensional t-SNE embedding with perplexity equal to $35$, number of iterations equal to $1500$, and random seed equal to $42$ on our standardized $\alpha$ facet and $f$-vector features. Solid contours indicate kernel density level sets at the $85$ percent and 95 percent quartiles. (b) Representative MOFs at property extrema. The selected structures illustrate the diversity of network geometries, pore architectures, and elemental compositions associated with extreme adsorption behaviors. Specifically, MUZKAV exhibits the highest Henry’s constant for N$_2$ ($7.89\times10^{-6}$ mol kg$^{-1}$ Pa$^{-1}$), while KIFKEQ attains the maximum Henry’s constant for O$_2$ ($9.79\times10^{-6}$ mol kg$^{-1}$ Pa$^{-1}$). ELOZEK shows the lowest Henry’s constants for both N$_2$ and O$_2$ ($8.86\times10^{-8}$ mol kg$^{-1}$ Pa$^{-1}$) and also demonstrates the minimum uptake capacities for N$_2$ and O$_2$ (0.0085 mol kg$^{-1}$ and 0.0086 mol kg$^{-1}$, respectively). In contrast, MUVHER and PORVUO exhibit the highest uptake capacities for N$_2$ (0.98 mol kg$^{-1}$) and O$_2$ (1.11 mol kg$^{-1}$), respectively. These structures highlight the range of adsorption responses captured by the CSCA framework.
  • ...and 5 more figures