Table of Contents
Fetching ...

HSG-12M: A Large-Scale Benchmark of Spatial Multigraphs from the Energy Spectra of Non-Hermitian Crystals

Xianquan Yan, Hakan Akgün, Kenji Kawaguchi, N. Duane Loh, Ching Hua Lee

TL;DR

This work tackles the lack of large, domain-specific datasets for non-Hermitian physics by introducing Poly2Graph, an end-to-end pipeline that automates the extraction of Hamiltonian spectral graphs from 1D crystal models. It constructs HSG-12M, the first large-scale dataset of spatial multigraphs with rich edge geometry, spanning 11.6 million static and 5.1 million dynamic graphs across 1401 characteristic-polynomial classes, derived from 177 TB of spectral-potential data. The study demonstrates the value of edge-aware representations for learning on spatial multigraphs, with GNNs showing strong Top-k retrieval capabilities but also revealing the need for geometry-aware, multi-edge modeling to handle scale and diversity. Overall, HSG-12M and Poly2Graph enable data-driven discovery in condensed matter physics and open avenues for geometry-aware graph learning and topology-informed inverse design beyond physics.

Abstract

AI is transforming scientific research by revealing new ways to understand complex physical systems, but its impact remains constrained by the lack of large, high-quality domain-specific datasets. A rich, largely untapped resource lies in non-Hermitian quantum physics, where the energy spectra of crystals form intricate geometries on the complex plane -- termed as Hamiltonian spectral graphs. Despite their significance as fingerprints for electronic behavior, their systematic study has been intractable due to the reliance on manual extraction. To unlock this potential, we introduce Poly2Graph: a high-performance, open-source pipeline that automates the mapping of 1-D crystal Hamiltonians to spectral graphs. Using this tool, we present HSG-12M: a dataset containing 11.6 million static and 5.1 million dynamic Hamiltonian spectral graphs across 1401 characteristic-polynomial classes, distilled from 177 TB of spectral potential data. Crucially, HSG-12M is the first large-scale dataset of spatial multigraphs -- graphs embedded in a metric space where multiple geometrically distinct trajectories between two nodes are retained as separate edges. This simultaneously addresses a critical gap, as existing graph benchmarks overwhelmingly assume simple, non-spatial edges, discarding vital geometric information. Benchmarks with popular GNNs expose new challenges in learning spatial multi-edges at scale. Beyond its practical utility, we show that spectral graphs serve as universal topological fingerprints of polynomials, vectors, and matrices, forging a new algebra-to-graph link. HSG-12M lays the groundwork for data-driven scientific discovery in condensed matter physics, new opportunities in geometry-aware graph learning and beyond.

HSG-12M: A Large-Scale Benchmark of Spatial Multigraphs from the Energy Spectra of Non-Hermitian Crystals

TL;DR

This work tackles the lack of large, domain-specific datasets for non-Hermitian physics by introducing Poly2Graph, an end-to-end pipeline that automates the extraction of Hamiltonian spectral graphs from 1D crystal models. It constructs HSG-12M, the first large-scale dataset of spatial multigraphs with rich edge geometry, spanning 11.6 million static and 5.1 million dynamic graphs across 1401 characteristic-polynomial classes, derived from 177 TB of spectral-potential data. The study demonstrates the value of edge-aware representations for learning on spatial multigraphs, with GNNs showing strong Top-k retrieval capabilities but also revealing the need for geometry-aware, multi-edge modeling to handle scale and diversity. Overall, HSG-12M and Poly2Graph enable data-driven discovery in condensed matter physics and open avenues for geometry-aware graph learning and topology-informed inverse design beyond physics.

Abstract

AI is transforming scientific research by revealing new ways to understand complex physical systems, but its impact remains constrained by the lack of large, high-quality domain-specific datasets. A rich, largely untapped resource lies in non-Hermitian quantum physics, where the energy spectra of crystals form intricate geometries on the complex plane -- termed as Hamiltonian spectral graphs. Despite their significance as fingerprints for electronic behavior, their systematic study has been intractable due to the reliance on manual extraction. To unlock this potential, we introduce Poly2Graph: a high-performance, open-source pipeline that automates the mapping of 1-D crystal Hamiltonians to spectral graphs. Using this tool, we present HSG-12M: a dataset containing 11.6 million static and 5.1 million dynamic Hamiltonian spectral graphs across 1401 characteristic-polynomial classes, distilled from 177 TB of spectral potential data. Crucially, HSG-12M is the first large-scale dataset of spatial multigraphs -- graphs embedded in a metric space where multiple geometrically distinct trajectories between two nodes are retained as separate edges. This simultaneously addresses a critical gap, as existing graph benchmarks overwhelmingly assume simple, non-spatial edges, discarding vital geometric information. Benchmarks with popular GNNs expose new challenges in learning spatial multi-edges at scale. Beyond its practical utility, we show that spectral graphs serve as universal topological fingerprints of polynomials, vectors, and matrices, forging a new algebra-to-graph link. HSG-12M lays the groundwork for data-driven scientific discovery in condensed matter physics, new opportunities in geometry-aware graph learning and beyond.

Paper Structure

This paper contains 49 sections, 45 equations, 6 figures, 8 tables, 1 algorithm.

Figures (6)

  • Figure 1: Number of graphs v.s. number of classes in HSG-12M compared to other graph-classification datasets. HSG-12M is the only large-scale multigraph (i.e. unlike simple graph that only allows one edge between any node pair) dataset, with exceptional class diversity even exceeds all other simple graph datasets. T-HSG-5M holds temporal spatial multigraphs. Table \ref{['tab:dataset_comparison']} lists comprehensive comparison against 45 other datasets.
  • Figure 2: Poly2Graph pipeline. (a) Starting from a 1-D crystal Hamiltonian $H(z)$ in momentum space---or, equivalently, its characteristic polynomial$P(z,E)=\det[{\bm{H}}(z)-E{\bm{I}}]$. The crystal's open-boundary spectrum solely depends on $P(z,E)$. (b) The spectral potential$\Phi(E)$ (Ronkin function) is computed from the roots of $P(z,E)=0$, following recent advances in non-Bloch band theory taiZoologyNonHermitianSpectra2023xiongGraphMorphologyNonHermitian2023wangAmoebaFormulation2024. (c) The density of states $\rho(E)$ is obtained as the Laplacian of $\Phi(E)$. (d) The spectral graph extracted from $\rho(E)$ via a morphological computer-vision pipeline. Varying the coefficients of $P(z,E)$ produces diverse graph morphologies in the real domain (d1)-(d3) and imaginary domain (di)-(diii).
  • Figure A3: The emergence of spectral graphs.(a)-(b) show the OBC energy spectra with increasing system size $L=[50, 150]$, of the non-Hermitian lattice whose characteristic polynomial is $P(z,E) = -z^{-2} - E - z + z^4$. In the thermodynamic limit ($L\to \infty$), the spectra becomes a band continuum and the energy loci traces out a planar graph on the complex plain, namely the spectral graph. For this particular example, it is a 3-Cayley tree. (c) shows the corresponding density of states when $L\to \infty$.
  • Figure A4: A Gallery of Spectral Graphs. The top four rows highlight the intricate structures characteristic of spectral graphs. The bottom row illustrates the distinct phenomenon we refer to as component fragmentation (Section \ref{['sec:discussion']})---some nodes in theory should be connected, however its surrounding low density of states limits accurate edge detection, causing certain nodes to be fragmented into disjoint nodes, often leading to fragmentation of an otherwise connected component. The phenomena often occurs for high-band and long-range hopping crystals.
  • Figure A5: Spectral Collapse & Spectral Potential. PBC spectrum usually appears as circles and loops; changing to OBC, the spectrum collapses into a graph skeleton. The spectral graph resides on the ridges of the potential landscape, $\Phi(E)$.
  • ...and 1 more figures