Table of Contents
Fetching ...

CARBON-2D Topological Descriptor (C2DTD): An Interpretable and Physics-Informed Representation for Two-Dimensional Carbon Networks

Felipe Hawthorne, Marcelo Lopes Pereira Junior, Fabiano Manoel de Andrade, Cristiano Francisco Woellner, Raphael Matozo Tromer

Abstract

Two-dimensional (2D) carbon networks, from pristine graphene to defect-rich and amorphous monolayers, exhibit a complex structure-energy landscape governed not only by local bonding but also by medium-range order and network topology. Capturing these multi-scale effects in a compact, interpretable, and data-efficient manner remains a major challenge for machine learning (ML) in low-dimensional materials. In this work, we introduce the CARBON-2D Topological Descriptor (C2DTD), a physically informed structural representation specifically designed for 2D carbon systems. The descriptor integrates local geometric statistics, a compact radial structural signature, and explicit primitive ring topology into a fixed-length, invariant vector that is both computationally efficient and directly interpretable. Benchmarking on diverse datasets of 2D carbon allotropes and defect-engineered graphene sheets demonstrates that C2DTD achieves robust predictive performance in small-data regimes, outperforming generic high-dimensional featurization schemes while preserving physical transparency. Unsupervised manifold analysis reveals a smoother alignment between descriptor space and the DFT energy landscape, and feature-importance and ablation studies confirm that ring topology emerges as a dominant energetic driver, particularly under vacancy-induced reconstruction. Furthermore, controlled simulations with 5-15% random vacancies show that C2DTD naturally captures the progressive transition from hexagon-dominated graphene to topologically disordered networks, enabling both dataset-level and structure-specific interpretation. Owing to its compactness, interpretability, and strong physics-based inductive bias, C2DTD provides a fast and generalizable framework for data-driven modeling, defect analysis, and high-throughput screening of 2D carbon materials.

CARBON-2D Topological Descriptor (C2DTD): An Interpretable and Physics-Informed Representation for Two-Dimensional Carbon Networks

Abstract

Two-dimensional (2D) carbon networks, from pristine graphene to defect-rich and amorphous monolayers, exhibit a complex structure-energy landscape governed not only by local bonding but also by medium-range order and network topology. Capturing these multi-scale effects in a compact, interpretable, and data-efficient manner remains a major challenge for machine learning (ML) in low-dimensional materials. In this work, we introduce the CARBON-2D Topological Descriptor (C2DTD), a physically informed structural representation specifically designed for 2D carbon systems. The descriptor integrates local geometric statistics, a compact radial structural signature, and explicit primitive ring topology into a fixed-length, invariant vector that is both computationally efficient and directly interpretable. Benchmarking on diverse datasets of 2D carbon allotropes and defect-engineered graphene sheets demonstrates that C2DTD achieves robust predictive performance in small-data regimes, outperforming generic high-dimensional featurization schemes while preserving physical transparency. Unsupervised manifold analysis reveals a smoother alignment between descriptor space and the DFT energy landscape, and feature-importance and ablation studies confirm that ring topology emerges as a dominant energetic driver, particularly under vacancy-induced reconstruction. Furthermore, controlled simulations with 5-15% random vacancies show that C2DTD naturally captures the progressive transition from hexagon-dominated graphene to topologically disordered networks, enabling both dataset-level and structure-specific interpretation. Owing to its compactness, interpretability, and strong physics-based inductive bias, C2DTD provides a fast and generalizable framework for data-driven modeling, defect analysis, and high-throughput screening of 2D carbon materials.

Paper Structure

This paper contains 7 sections, 16 equations, 6 figures, 1 table.

Figures (6)

  • Figure 1: Overview of the CARBON-2D Topological Descriptor framework, illustrating (a) the structural diversity of the investigated systems, (b) the descriptor extraction pipeline, (c) the physical mapping of the feature space, and (d) the average ring-size distribution across the dataset.
  • Figure 2: Parity plots of predicted versus DFT total energies for 2D carbon structures evaluated at a test size of 0.3, comprising (a) the predictions derived from the C2DTD framework and (b) the predictions obtained utilizing matminer structural features. The dashed lines indicate perfect agreement between predicted and reference energies.
  • Figure 3: Interpretability and ablation analysis of the topological descriptor evaluated at a test fraction of 0.3, depicting (a) the top ten features ranked by relative algorithmic gain, (b) the Spearman rank correlation matrix between normalized ring fractions and total energy, and (c-f) the parity plots and predictive metrics for the full descriptor, the isolated ring statistics, the isolated radial distribution function, and the isolated local geometry features, respectively.
  • Figure 4: Unsupervised dimensionality reduction of the descriptor spaces colored by DFT total energy, showing (a) the principal component analysis projection for the C2DTD framework, (b) the corresponding principal component analysis projection for the matminer structural features, and (c) the non-linear t-distributed stochastic neighbor embedding for the C2DTD framework.
  • Figure 5: Interpretability analysis of the topological descriptor evaluated on a defect-engineered graphene dataset comprising 150 structurally relaxed configurations, illustrating (a) the heatmap of primitive ring fractions across the systematically perturbed dataset, (b-c) the ring distribution for a representative low-defect structure with a 5% vacancy concentration, (d-e) the corresponding distribution for an intermediate-defect configuration with a 10% vacancy concentration, and (f-g) the structural profile of a high-defect network containing a 15% vacancy concentration.
  • ...and 1 more figures