Table of Contents
Fetching ...

LLM-driven discovery for carbon allotropes with bond-network entropy

Yuzhou Hao, Yujie Liu, Xuejie Li, Turab Lookman, Xiangdong Ding, Jun Sun, Zhibin Gao

Abstract

The discovery of novel carbon allotropes with tailored thermal and mechanical properties is critical for advanced thermal management. However, exploring the vast configurational space of carbon using \textit{ab initio} calculations remains computationally prohibitive. Driven by the rich topological landscape of carbon, where the competition between $sp, sp^2,$ and $sp^3$ hybridization states dictates material performance, we establish a closed-loop AI framework to explore this complex configurational space. We introduce a hybridization entropy descriptor to guide the search beyond conventional forms. Here, we establish a closed-loop AI framework that synergizes a Large Language Model (LLM) for structural generation with a Machine Learning Potential (MLP) for accelerated evaluation. Leveraging CrystaLLM to generate candidates and an iteratively refined MLP for high-fidelity validation, we screened thousands of structures to identify several stable allotropes with exotic properties. Specifically, we report ``yne-diamond C$_{12}$'' and ``yne-hex-diamond C$_{8}$'', which exhibit extreme thermal anisotropy and ultralow in-plane shear stiffness arising from their mixed $sp$-$sp^3$ hybridization. Furthermore, we discovered a complex $sp$-$sp^2$-$sp^3$ hybridized C$_{12}$ phase that combines metallic conductivity with an anomalous negative Poisson's ratio. Notably, we identified a superhard phase (C16_3) possessing a calculated Vickers hardness (103.3 GPa) exceeding that of diamond 96 GPa). Microscopic analysis reveals that thermal transport in these materials is governed by the interplay between rigid frameworks and flexible linkers. This work expands the known carbon phase space and demonstrates the efficacy of coupling generative AI with machine learning potentials for the accelerated inverse design of functional materials.

LLM-driven discovery for carbon allotropes with bond-network entropy

Abstract

The discovery of novel carbon allotropes with tailored thermal and mechanical properties is critical for advanced thermal management. However, exploring the vast configurational space of carbon using \textit{ab initio} calculations remains computationally prohibitive. Driven by the rich topological landscape of carbon, where the competition between and hybridization states dictates material performance, we establish a closed-loop AI framework to explore this complex configurational space. We introduce a hybridization entropy descriptor to guide the search beyond conventional forms. Here, we establish a closed-loop AI framework that synergizes a Large Language Model (LLM) for structural generation with a Machine Learning Potential (MLP) for accelerated evaluation. Leveraging CrystaLLM to generate candidates and an iteratively refined MLP for high-fidelity validation, we screened thousands of structures to identify several stable allotropes with exotic properties. Specifically, we report ``yne-diamond C'' and ``yne-hex-diamond C'', which exhibit extreme thermal anisotropy and ultralow in-plane shear stiffness arising from their mixed - hybridization. Furthermore, we discovered a complex -- hybridized C phase that combines metallic conductivity with an anomalous negative Poisson's ratio. Notably, we identified a superhard phase (C16_3) possessing a calculated Vickers hardness (103.3 GPa) exceeding that of diamond 96 GPa). Microscopic analysis reveals that thermal transport in these materials is governed by the interplay between rigid frameworks and flexible linkers. This work expands the known carbon phase space and demonstrates the efficacy of coupling generative AI with machine learning potentials for the accelerated inverse design of functional materials.
Paper Structure (4 sections, 2 equations, 9 figures)

This paper contains 4 sections, 2 equations, 9 figures.

Figures (9)

  • Figure 1: Schematic illustration of the closed-loop AI-driven materials discovery workflow. The framework integrates two synergistic active-learning cycles. The generative cycle (left) utilizes a Large Language Model (CrystaLLM) to propose and screen candidate structures with varying atom counts (C$1$–C${100}$), followed by fine-tuning on DFT-verified stable phases. The potential training cycle (right) iteratively refines the Machine Learning Potential (MLP) using on-the-fly data generation. This combined pipeline enables rapid high-throughput screening and accurate thermal/mechanical property evaluation via GPUMD and ShengBTE simulations.
  • Figure 2: Training performance and dataset diversity of the carbon Machine Learning Potential (NEP). (a) Evolution of loss functions (including L$_1$/L$_2$ regularization) and RMSE values for energy, force, and virial versus training steps, illustrating model convergence. (b) Distribution of atomic environments in the descriptor space, color-coded by energy per atom, demonstrating comprehensive structural coverage. (c)–(g) Representative structures from the training set, including fullerenes (c$_1$-c$_6$), layered graphene (d$_1$-d$_3$), diamond (e$_1$-e$_3$), $sp$ hybridization states carbon (f$_1$-f$_8$) and diverse 3D frameworks (g$_1$-g$_9$).
  • Figure 3: Validation of the NEP model against DFT benchmarks. (a) Visualization of atomic environments in the descriptor space, showing distinct clustering for carbyne (red), graphene (light blue), and diamond (dark blue). (b)–(d) Parity plots comparing NEP predictions with DFT calculations for (b) energy per atom, (c) virial stress, and (d) atomic forces. The diagonal lines indicate perfect agreement.
  • Figure 4: Phonon dispersion relations for representative carbon allotropes. Comparison between predictions from the NEP model (dashed blue lines) and DFT benchmarks (solid red lines). The panels show: (a) cubic diamond, (b) hexagonal diamond, (c) monolayer graphene, (d) body-centered cubic BC8, (e) monolayer quasi-hexagonal fullerene, and (f) body-centered tetragonal C$_4$.
  • Figure 5: Newly discovered stable carbon allotropes and their phonon spectra. (a)–(f) Representative structures generated by the LLM alongside their phonon dispersion relations calculated by the NEP. The structures are labeled as: (a) C3_6, (b) C24_4, (c) C22_6, (d) C10_13, (e) C16_3, and (f) C52_15. Here, the notation C$n\_{m}$ denotes the $m$-th predicted candidate from a generative prompt for $n$ atoms in the conventional cell. The absence of imaginary frequencies confirms the dynamical stability of all shown phases.
  • ...and 4 more figures