Table of Contents
Fetching ...

Quotient complex (QC)-based machine learning for 2D perovskite design

Chuan-Shen Hu, Rishikanta Mayengbam, Kelin Xia, Tze Chien Sum

TL;DR

The paper addresses the challenge of representing 2D perovskites by capturing both higher-order interactions and periodicity. It introduces a quotient-complex (QC) framework, defining $\overline{K} = K/\sim_V$ and constructing a homotopy-equivalent model $\widetilde{K}$ so that $H_q(\overline{K}) \cong H_q(\widetilde{K})$ for $q>1$, with corresponding relations for $q=0,1$. Persistent homology is then applied to QC filtrations to yield QC-based descriptors (QCDs) from persistence barcodes ${\rm PB}_q(\overline{K_\bullet})$, including dimension-0 and dimension-1 components and element-specific variants. A gradient-boosted tree model (QC-GBT) using these QCDs, tested on the NMSE 2D perovskite dataset, delivers competitive to superior bandgap predictions compared with state-of-the-art SOAP and GNN approaches, illustrating the critical role of periodicity information in material functionality. The work provides open data and code, highlighting the practical impact for accelerated design of 2D perovskites and broader periodic materials.

Abstract

With remarkable stability and exceptional optoelectronic properties, two-dimensional (2D) halide layered perovskites hold immense promise for revolutionizing photovoltaic technology. Presently, inadequate representations have substantially impeded the design and discovery of 2D perovskites. In this context, we introduce a novel computational topology framework termed the quotient complex (QC), which serves as the foundation for the material representation. Our QC-based features are seamlessly integrated with learning models for the advancement of 2D perovskite design. At the heart of this framework lies the quotient complex descriptors (QCDs), representing a quotient variation of simplicial complexes derived from materials unit cell and periodic boundary conditions. Differing from prior material representations, this approach encodes higher-order interactions and periodicity information simultaneously. Based on the well-established New Materials for Solar Energetics (NMSE) databank, our QC-based machine learning models exhibit superior performance against all existing counterparts. This underscores the paramount role of periodicity information in predicting material functionality, while also showcasing the remarkable efficiency of the QC-based model in characterizing materials structural attributes.

Quotient complex (QC)-based machine learning for 2D perovskite design

TL;DR

The paper addresses the challenge of representing 2D perovskites by capturing both higher-order interactions and periodicity. It introduces a quotient-complex (QC) framework, defining and constructing a homotopy-equivalent model so that for , with corresponding relations for . Persistent homology is then applied to QC filtrations to yield QC-based descriptors (QCDs) from persistence barcodes , including dimension-0 and dimension-1 components and element-specific variants. A gradient-boosted tree model (QC-GBT) using these QCDs, tested on the NMSE 2D perovskite dataset, delivers competitive to superior bandgap predictions compared with state-of-the-art SOAP and GNN approaches, illustrating the critical role of periodicity information in material functionality. The work provides open data and code, highlighting the practical impact for accelerated design of 2D perovskites and broader periodic materials.

Abstract

With remarkable stability and exceptional optoelectronic properties, two-dimensional (2D) halide layered perovskites hold immense promise for revolutionizing photovoltaic technology. Presently, inadequate representations have substantially impeded the design and discovery of 2D perovskites. In this context, we introduce a novel computational topology framework termed the quotient complex (QC), which serves as the foundation for the material representation. Our QC-based features are seamlessly integrated with learning models for the advancement of 2D perovskite design. At the heart of this framework lies the quotient complex descriptors (QCDs), representing a quotient variation of simplicial complexes derived from materials unit cell and periodic boundary conditions. Differing from prior material representations, this approach encodes higher-order interactions and periodicity information simultaneously. Based on the well-established New Materials for Solar Energetics (NMSE) databank, our QC-based machine learning models exhibit superior performance against all existing counterparts. This underscores the paramount role of periodicity information in predicting material functionality, while also showcasing the remarkable efficiency of the QC-based model in characterizing materials structural attributes.
Paper Structure (27 sections, 14 theorems, 46 equations, 12 figures, 5 tables)

This paper contains 27 sections, 14 theorems, 46 equations, 12 figures, 5 tables.

Key Result

Theorem A.1.1

Let $X$ and $Y$ be topological spaces, $A$ a subspace of $X$, and $f: A \rightarrow Y$ be a continuous map. Let $\phi$ be the continuous map defined in Equation: Universal property map-1. Then, the following assertions hold.

Figures (12)

  • Figure 1: Illustration of a crystal structure (Panel A) and its finite representations (Panel B), including the unit cell and supercell representations. The entire crystal structure is built by repeating the unit cell or supercell. In particular, the supercell depicted in B represents a $2 \times 2 \times 2$ extension of the unit cell. Based on the periodicity of duplicated atoms, a quotient graph representation is obtained (Panel C). Furthermore, by leveraging higher-dimensional objects (e.g., 2-dimensional surfaces) in the quotient graph, the quotient complex (Panel D) is established. Panel E illustrates a schematic image of a quotient complex filtration derived from the crystal structure. The crystal, unit cell, and supercell structures were visualized using the VESTA program momma2011vesta.
  • Figure 2: Flowchart illustrating the quotient complex (QC)-based ML model used in this study to predict material bandgaps for 2D perovskite structures. In this work, we atomwisely compute the QC of a given material structure. The second panel of the flowchart depicts spheres centered at the iodine atoms of the material at varying radii, accompanied by the schematic visualization of the corresponding QC filtration. The third panel illustrates the persistence barcodes (PBs) of the QC filtration. Each PB collects intervals (or bars) with beginning and ending values based on the multiscale process, referred to as the birth and death of the interval. By examining the birth and death of intervals within the PB, the feature vector is generated, which serves as an input vector for the ML models. Finally, a gradient boosting tree (GBT) model is applied to the QC-based descriptors to predict material bandgaps. Perovskite structures were visualized using the VESTA program momma2011vesta.
  • Figure 3: A 2D illustration includes a unit cell, supercell, periodic motif, periodic simplicial complex, and a periodic graph. In Panel A, the red arrows ($v_1$ and $v_2$) accompanying the unit cell specifically represent a basis within 2-dimensional Euclidean space $\mathbb{R}^2$. In comparison to the shaded region of the supercell, the darker region denotes the unit cell containing $3$ points, forming the parallelepiped spanned by the basis vectors. In this example, a $3 \times 2$ supercell is illustrated, comprising a union of $6$ translated cells, totaling $3 \times 3 \times 2 = 18$ points. Panel B displays examples of $2$-dimensional periodic objects derived from unit cell information: a $2$-periodic motif, a $2$-periodic graph, and a $2$-periodic simplicial complex. The shaded regions within these objects delineate finite $d$-periodic objects, which constitute the primary focus of this study.
  • Figure 4: Illustration of QCs obtained from a finite 2-periodic simplicial complex $K$ and its subcomplexes $G$ and $V$ within a 2-copies supercell embedded in $\mathbb{R}^2$. In this example, three equivalence relations are defined on $K$: $\sim_K$, $\sim_G$, and $\sim_V$, where $G$ and $V$ are subcomplexes of all 1-simplices (i.e., edges) and 0-simplices (i.e., vertices), respectively. For the illustrations of $K$, $G$, and $V$, vertices, edges, and triangles are annotated with the same color if they are equivalent based on the translation action on simplices. The QCs $V/\sim_V$, $G/\sim_G$, $G/\sim_V$, $K/\sim_K$, $K/\sim_G$, $K/\sim_V$ are illustrated in the second and third row, with particular emphasis on $K/\sim_V$ as the main focus in this study.
  • Figure 5: Illustration of a $10 \times 20 \times 30$ unit cell encompassing the motif ${ (0,0,0), (5,10,15) }$ and the induced persistence barcode of the quotient complex filtration. The quotient complex filtration is constructed from a Vietoris--Rips filtration based on settings in \ref{['Eq. The periodic equivalence relation']}, \ref{['Eq. 4M']}, and \ref{['Eq. Distance function of extended cells-v1']} with filtration levels ranging from $0$ to $40$.
  • ...and 7 more figures

Theorems & Definitions (26)

  • Theorem A.1.1
  • proof
  • Corollary A.1.2
  • proof
  • Corollary A.1.3
  • proof
  • Theorem A.2.1: brown2006topology
  • Corollary A.2.2
  • Definition A.2.3
  • Theorem A.2.4
  • ...and 16 more