Table of Contents
Fetching ...

COFAP: A Universal Framework for COFs Adsorption Prediction through Designed Multi-Modal Extraction and Cross-Modal Synergy

Zihan Li, Mingyang Wan, Mingyu Gao, Zhongshan Chen, Xiangke Wang, Feifan Zhang

TL;DR

The work addresses the challenge of predicting gas adsorption/separation performance for covalent organic frameworks (COFs) without gas-specific descriptors. It introduces COFAP, a universal multi-modal predictor that learns compact representations from sectional-plane pore geometry (SP-cVAE), topological fingerprints (PH-NN), and coarse-grained linker–linkage chemistry (BiG-CAE), fused via cross-attention. A weight-adjustable prioritization scheme enables application-tailored ranking, and COFAP achieves state-of-the-art predictive accuracy on the hypoCOFs dataset while delivering orders-of-magnitude faster inference than conventional simulation or gas-specific ML features. The framework reveals narrow windows of pore size and surface-area parameters that maximize CH4/H2 separation performance, offers interpretable structure–property insights, and provides a scalable, transferable approach for high-throughput screening of crystalline porous materials beyond COFs. Together, these contributions enable fast, reliable screening and guide inverse design for efficient gas adsorption and separation in porous frameworks, with public data and code available for community use.

Abstract

Covalent organic frameworks (COFs) are promising adsorbents for gas adsorption and separation, while identifying the optimal structures among their vast design space requires efficient high-throughput screening. Conventional machine-learning predictors rely heavily on specific gas-related features. However, these features are time-consuming and limit scalability, leading to inefficiency and labor-intensive processes. Herein, a universal COFs adsorption prediction framework (COFAP) is proposed, which can extract multi-modal structural and chemical features through deep learning, and fuse these complementary features via cross-modal attention mechanism. Without Henry coefficients or adsorption heat, COFAP sets a new SOTA by outperforming previous approaches on hypoCOFs dataset. Based on COFAP, we also found that high-performing COFs for separation concentrate within a narrow range of pore size and surface area. A weight-adjustable prioritization scheme is also developed to enable flexible, application-specific ranking of candidate COFs for researchers. Superior efficiency and accuracy render COFAP directly deployable in crystalline porous materials.

COFAP: A Universal Framework for COFs Adsorption Prediction through Designed Multi-Modal Extraction and Cross-Modal Synergy

TL;DR

The work addresses the challenge of predicting gas adsorption/separation performance for covalent organic frameworks (COFs) without gas-specific descriptors. It introduces COFAP, a universal multi-modal predictor that learns compact representations from sectional-plane pore geometry (SP-cVAE), topological fingerprints (PH-NN), and coarse-grained linker–linkage chemistry (BiG-CAE), fused via cross-attention. A weight-adjustable prioritization scheme enables application-tailored ranking, and COFAP achieves state-of-the-art predictive accuracy on the hypoCOFs dataset while delivering orders-of-magnitude faster inference than conventional simulation or gas-specific ML features. The framework reveals narrow windows of pore size and surface-area parameters that maximize CH4/H2 separation performance, offers interpretable structure–property insights, and provides a scalable, transferable approach for high-throughput screening of crystalline porous materials beyond COFs. Together, these contributions enable fast, reliable screening and guide inverse design for efficient gas adsorption and separation in porous frameworks, with public data and code available for community use.

Abstract

Covalent organic frameworks (COFs) are promising adsorbents for gas adsorption and separation, while identifying the optimal structures among their vast design space requires efficient high-throughput screening. Conventional machine-learning predictors rely heavily on specific gas-related features. However, these features are time-consuming and limit scalability, leading to inefficiency and labor-intensive processes. Herein, a universal COFs adsorption prediction framework (COFAP) is proposed, which can extract multi-modal structural and chemical features through deep learning, and fuse these complementary features via cross-modal attention mechanism. Without Henry coefficients or adsorption heat, COFAP sets a new SOTA by outperforming previous approaches on hypoCOFs dataset. Based on COFAP, we also found that high-performing COFs for separation concentrate within a narrow range of pore size and surface area. A weight-adjustable prioritization scheme is also developed to enable flexible, application-specific ranking of candidate COFs for researchers. Superior efficiency and accuracy render COFAP directly deployable in crystalline porous materials.

Paper Structure

This paper contains 1 section, 11 equations, 15 figures, 38 tables.

Table of Contents

  1. Supporting Information

Figures (15)

  • Figure 1: (A) Overall workflow. (B) Sectional Plane – convolutional Variational Autoencoder (SP-cVAE): sectional planes of COFs combined with global molecular descriptors are encoded and reconstructed through an ELBO-based encoder–decoder framework, producing compact structural and chemical representations. (C) Persistent Homology – Neural Network (PH-NN): persistent-homology fingerprints combined with global structural descriptors are processed by multilayer perceptron (MLP) to capture hidden topological structural representations. (D) Bipartite Graph – Contrastive Autoencoder (BiG-CAE): coarse-grained bipartite graphs of linkers and linkages are trained via contrastive and reconstruction learning within a GCN/MLP encoder–decoder, yielding hidden group chemical representations. (E) Feature fusion: integration of cross-modal features through a cross-attention block, followed by a fusion layer and final MLP predictor. (F) High-throughput screening: application of COFAP to adsorption and separation tasks, highlighting top-ranked hypoCOFs, feature distributions, a weight-adjustable prioritization pipeline, and the identified optimal range of pore limiting diameter (PLD), largest cavity diameter (LCD), accessible surface area ($S_{\mathrm{acc}}$) and porosity ($\phi$) for CH4/H2 separation.
  • Figure 2: (a) Illustration of the nine sectional planes used to reduce 3D COF structures to 2D views. Left column (i–ix) shows the 3D point-clouds with each plane’s orientation highlighted; right column (I–IX) presents the corresponding 2D planes produced by projecting the same structure onto each plane. The nine planes are defined by their normal vectors: (i) [1,0,0] ($x$-axis), (ii) [0,1,0] ($y$-axis), (iii) [0,0,1] ($z$-axis), (iv) [1,1,0] ($xy$-diagonal), (v) [0,1,1] ($yz$-diagonal), (vi) [1,1,1] (body diagonal, corner-to-opposite-corner), (vii) [-1,1,1] (skew diagonal across opposing corners), (viii) [2,1,0] (off-axis, skewed in the $xy$-plane), and (ix) [0,2,1] (off-axis, skewed in the $yz$-plane). The right column shows sectional planes with two channels: in the atom channel, blue, green, yellow, and purple dots represent C, O, H, and N atoms respectively, while the bond channel is uniformly shown in red. Example shown: linker100_CH$_2$_linker12_NH_qtz_relaxed_interp_2; panels (i)–(ix) on the left correspond to panels (I)–(IX) on the right.(b) Bipartite graphs are constructed with linkage nodes ($n$) and linker nodes ($l$). Linkage node positions are identified by distance-based screening of CIF geometries. (c) Weights of the three pre-trained encoders are frozen and used as fixed feature extractors in the fusion model: the SP-cVAE provides the queries, while the auxiliary branches (PH-NN and BiG-CAE) provide the keys and values for the cross-attention module.
  • Figure 3: (a) Scatter plot of unseen data (green) and seen data (yellow) for CH4/H2 separation task-related targets prediction, where the scatter points are tightly distributed along the diagonal, indicating good predictive performance of the model. (b) The bar charts of ablation study results showing the R$^2$ of model components SP-cVAE, PH-NN, BiG-CAE (which is separated into CC and non-CC, as the node(n) of the structures whose linkers are directly connected by carbon atoms differs from those connected by linkages) and COFAP in predicting the same set of targets as (a). The rest of scatter plots and bar charts are presented in Figures \ref{['fig:separation']}-\ref{['fig:pressure']} and Figure \ref{['fig:ablation_chart']} respectively.
  • Figure 4: (a) Workflow of the high-throughput screening procedure, comprising two stages: adsorption and separation. In the adsorption stage, complete ranked lists of predicted uptakes are generated to enable efficient candidate triage. In the separation stage, two derived metrics—regenerability ($R\%$) and adsorbent performance score ($\mathrm{APS}$)—are normalized and linearly combined into composite scores. Top-10 candidates are then identified under different weight settings (Tables \ref{['tab:0.5_0.5']}, \ref{['tab:0.2_0.8']}, and \ref{['tab:0.8_0.2']}), followed by statistical aggregation of the top-100 COFs' structural features. (b) Example statistics for the top-100 COFs in the separation of CH4 and H2 for VSA. The three bar charts (from top to bottom) correspond to weight combinations of regenerability ($w_R$) and performance score ($w_A$) as follows: $w_R=0.5, w_A=0.5$; $w_R=0.2, w_A=0.8$; and $w_R=0.8, w_A=0.2$, showing the aggregated distributions of linker type, bond (linkage) type, and topological net. (c) Weight-sensitivity analysis for the separation task under VSA conditions. The heatmap depicts the Top-10 list overlap fraction relative to the baseline case ($w_R=w_A=0.5$) across the entire weight grid. Regions of high overlap indicate stable candidate sets robust to prioritization choices, while low-overlap regions reveal requirement of trade-offs between $R\%$ and $\mathrm{APS}$ according to application preferences.
  • Figure 5: (a) Visualization of the best COF linker110_C_linker91_C_tfg_relaxed identified in Tables \ref{['tab:0.5_0.5']} and \ref{['tab:0.2_0.8']}. (b) Visualization of the best COF linker110_C_linker94_C_jeb_relaxed identified in Table \ref{['tab:0.8_0.2']}.
  • ...and 10 more figures