Table of Contents
Fetching ...

CatFlow: Co-generation of Slab-Adsorbate Systems via Flow Matching

Minkyu Kim, Nayoung Kim, Honghui Kim, Sungsoo Ahn

TL;DR

The paper tackles the challenge of designing heterogeneous catalysts by enabling end-to-end co-generation of slab structures and adsorbate coordinates. It introduces CatFlow, a flow-matching framework that uses a factorized slab-adsorbate representation—including primitive cell, transformation matrix, vacuum scaling factor, and adsorbate—to reduce modeling complexity while preserving surface orientation. A transformer-based architecture with continuous and discrete flow components learns the joint distribution for de novo generation and structure prediction, validated on OC20 against strong baselines. Results show CatFlow yields higher structural fidelity and adsorption-energy realism, approaching thermodynamic local minima and enabling efficient exploration of catalyst surfaces. This approach lays the groundwork for scalable inverse design and multi-adsorbate catalyst discovery by tightly coupling surface geometry and adsorption phenomena in a single end-to-end model.

Abstract

Discovering heterogeneous catalysts tailored for specific reaction intermediates remains a fundamental bottleneck in materials science. While traditional trial-and-error methods and recent generative models have shown promise, they struggle to capture the intrinsic coupling between surface geometry and adsorbate interactions. To address this limitation, we propose CatFlow, a flow matching-based framework for de novo design and structure prediction of heterogeneous catalysts. Our model operates on a primitive cell-based factorized representation of the slab-adsorbate complex, reducing the number of learnable variables by an average of 9.2x while explicitly encoding the surface orientation of the slab-adsorbate interface. Experiments on the Open Catalyst 2020 dataset demonstrate that CatFlow significantly improves the structural fidelity of generated catalysts compared to autoregressive and sequential baselines. Further experiments show that the generated structures accurately capture the adsorption energy distributions of physically plausible interfaces and lie closer to thermodynamic local minima.

CatFlow: Co-generation of Slab-Adsorbate Systems via Flow Matching

TL;DR

The paper tackles the challenge of designing heterogeneous catalysts by enabling end-to-end co-generation of slab structures and adsorbate coordinates. It introduces CatFlow, a flow-matching framework that uses a factorized slab-adsorbate representation—including primitive cell, transformation matrix, vacuum scaling factor, and adsorbate—to reduce modeling complexity while preserving surface orientation. A transformer-based architecture with continuous and discrete flow components learns the joint distribution for de novo generation and structure prediction, validated on OC20 against strong baselines. Results show CatFlow yields higher structural fidelity and adsorption-energy realism, approaching thermodynamic local minima and enabling efficient exploration of catalyst surfaces. This approach lays the groundwork for scalable inverse design and multi-adsorbate catalyst discovery by tightly coupling surface geometry and adsorption phenomena in a single end-to-end model.

Abstract

Discovering heterogeneous catalysts tailored for specific reaction intermediates remains a fundamental bottleneck in materials science. While traditional trial-and-error methods and recent generative models have shown promise, they struggle to capture the intrinsic coupling between surface geometry and adsorbate interactions. To address this limitation, we propose CatFlow, a flow matching-based framework for de novo design and structure prediction of heterogeneous catalysts. Our model operates on a primitive cell-based factorized representation of the slab-adsorbate complex, reducing the number of learnable variables by an average of 9.2x while explicitly encoding the surface orientation of the slab-adsorbate interface. Experiments on the Open Catalyst 2020 dataset demonstrate that CatFlow significantly improves the structural fidelity of generated catalysts compared to autoregressive and sequential baselines. Further experiments show that the generated structures accurately capture the adsorption energy distributions of physically plausible interfaces and lie closer to thermodynamic local minima.
Paper Structure (36 sections, 7 equations, 8 figures, 4 tables)

This paper contains 36 sections, 7 equations, 8 figures, 4 tables.

Figures (8)

  • Figure 1: Visualization of the co-generation trajectory conditioned on the adsorbate. We illustrate the synchronized evolution of the slab-adsorbate system from the initial noise distribution ($t=0$) to the final structure ($t=1$) for de novo generation (top) and structure prediction (bottom). The framework jointly generates the components of the factorized representation to construct the slab-adsorbate system structure for the target adsorbate. At each time step, the catalyst system is explicitly constructed from these generated factorized representation components.
  • Figure 2: Histogram of atom counts in catalyst structures. We compare the histograms of atom counts for slab structures (blue) and their corresponding primitive cells (green). The primitive cells require fewer atoms than the slab structures, reducing the number of learnable variables for the generative model.
  • Figure 3: Conceptual description of factorized representation. The primitive cell $\mathcal{S}_{\mathrm{prim}}$ (top left) is transformed by the transformation matrix $\bm{M}$ to construct the slab lattice $\bm{L}_{\mathrm{slab}} = \bm{M}\,\bm{L}_{\mathrm{prim}}$ (top right). The slab structure is generated by replicating primitive cell atoms at all translation vectors lying within $\bm{L}_{\mathrm{slab}}$ (bottom right). The vacuum scaling factor $k_{\mathrm{vac}}$ extends $\bm{L}_{\mathrm{slab}}$ along the vertical axis to create the system cell, and the adsorbate atoms defined by the atomic species $\bm{A}_{\mathrm{ads}}$ and the atomic coordinates $\bm{X}_{\mathrm{ads}}$ are placed on the slab surface to form the complete slab-adsorbate system (bottom left).
  • Figure 4: Visualization of generated slab-adsorbate structures. We present generated samples for (a) de novo generation and (b) structure prediction tasks. The multi-view renderings (perspective, top, and left) illustrate that CatFlow constructs geometrically precise structures capable of accommodating diverse and bulky adsorbates. The accompanying adsorption energies further confirm that these generated configurations are physically reasonable and situated in stable local minima.
  • Figure 5: Adsorption energy histograms in de novo generation. Comparison of adsorption energy distributions for representative adsorbates, randomly selected to demonstrate diverse cases. Each panel shows the kernel density estimation (KDE) plots for the validation set in distribution (Val ID) (blue), CatGPT (orange), and CatFlow (green). The vertical dashed lines indicate the mean adsorption energies for each distribution. Qualitatively, the energy profiles generated by CatFlow exhibit stronger overlap with the validation set across most adsorbates, particularly for complex molecules such as *CH*CH, *NO3, *COHCOH, and *OCHCH3, while CatGPT shows extended tails toward positive energies, indicating generation of unstable configurations.
  • ...and 3 more figures