Table of Contents
Fetching ...

QuantumCanvas: A Multimodal Benchmark for Visual Learning of Atomic Interactions

Can Polat, Erchin Serpedin, Mustafa Kurban, Hasan Kurban

TL;DR

QuantumCanvas reframes atomic interactions by treating two-body quantum systems as fundamental units and pairing scalar quantum descriptors with interpretable, coordinate-free orbital images. The dataset covers 2,850 diatomics across 75 elements, with 18 targets and 10-channel image representations derived from orbital populations, enabling multimodal learning that couples quantum physics with vision-based features. Benchmark results across eight architectures and 18 targets reveal modality-specific strengths and show that two-body pretraining improves convergence and generalization to molecular and crystalline domains. This work provides a principled foundation for transferable quantum learning and opens avenues to extend multimodal representations to triatomic and time-dependent regimes.

Abstract

Despite rapid advances in molecular and materials machine learning, most models still lack physical transferability: they fit correlations across whole molecules or crystals rather than learning the quantum interactions between atomic pairs. Yet bonding, charge redistribution, orbital hybridization, and electronic coupling all emerge from these two-body interactions that define local quantum fields in many-body systems. We introduce QuantumCanvas, a large-scale multimodal benchmark that treats two-body quantum systems as foundational units of matter. The dataset spans 2,850 element-element pairs, each annotated with 18 electronic, thermodynamic, and geometric properties and paired with ten-channel image representations derived from l- and m-resolved orbital densities, angular field transforms, co-occupancy maps, and charge-density projections. These physically grounded images encode spatial, angular, and electrostatic symmetries without explicit coordinates, providing an interpretable visual modality for quantum learning. Benchmarking eight architectures across 18 targets, we report mean absolute errors of 0.201 eV on energy gap using GATv2, 0.265 eV on HOMO and 0.274 eV on LUMO using EGNN. For energy-related quantities, DimeNet attains 2.27 eV total-energy MAE and 0.132 eV repulsive-energy MAE, while a multimodal fusion model achieves a 2.15 eV Mermin free-energy MAE. Pretraining on QuantumCanvas further improves convergence stability and generalization when fine-tuned on larger datasets such as QM9, MD17, and CrysMTM. By unifying orbital physics with vision-based representation learning, QuantumCanvas provides a principled and interpretable basis for learning transferable quantum interactions through coupled visual and numerical modalities. Dataset and model implementations are available at https://github.com/KurbanIntelligenceLab/QuantumCanvas.

QuantumCanvas: A Multimodal Benchmark for Visual Learning of Atomic Interactions

TL;DR

QuantumCanvas reframes atomic interactions by treating two-body quantum systems as fundamental units and pairing scalar quantum descriptors with interpretable, coordinate-free orbital images. The dataset covers 2,850 diatomics across 75 elements, with 18 targets and 10-channel image representations derived from orbital populations, enabling multimodal learning that couples quantum physics with vision-based features. Benchmark results across eight architectures and 18 targets reveal modality-specific strengths and show that two-body pretraining improves convergence and generalization to molecular and crystalline domains. This work provides a principled foundation for transferable quantum learning and opens avenues to extend multimodal representations to triatomic and time-dependent regimes.

Abstract

Despite rapid advances in molecular and materials machine learning, most models still lack physical transferability: they fit correlations across whole molecules or crystals rather than learning the quantum interactions between atomic pairs. Yet bonding, charge redistribution, orbital hybridization, and electronic coupling all emerge from these two-body interactions that define local quantum fields in many-body systems. We introduce QuantumCanvas, a large-scale multimodal benchmark that treats two-body quantum systems as foundational units of matter. The dataset spans 2,850 element-element pairs, each annotated with 18 electronic, thermodynamic, and geometric properties and paired with ten-channel image representations derived from l- and m-resolved orbital densities, angular field transforms, co-occupancy maps, and charge-density projections. These physically grounded images encode spatial, angular, and electrostatic symmetries without explicit coordinates, providing an interpretable visual modality for quantum learning. Benchmarking eight architectures across 18 targets, we report mean absolute errors of 0.201 eV on energy gap using GATv2, 0.265 eV on HOMO and 0.274 eV on LUMO using EGNN. For energy-related quantities, DimeNet attains 2.27 eV total-energy MAE and 0.132 eV repulsive-energy MAE, while a multimodal fusion model achieves a 2.15 eV Mermin free-energy MAE. Pretraining on QuantumCanvas further improves convergence stability and generalization when fine-tuned on larger datasets such as QM9, MD17, and CrysMTM. By unifying orbital physics with vision-based representation learning, QuantumCanvas provides a principled and interpretable basis for learning transferable quantum interactions through coupled visual and numerical modalities. Dataset and model implementations are available at https://github.com/KurbanIntelligenceLab/QuantumCanvas.

Paper Structure

This paper contains 40 sections, 26 equations, 3 figures, 1 table.

Figures (3)

  • Figure 1: Periodic-table summary for the 75 elements included in 2,850 diatomic calculations. Each cell shows the element symbol, the mean Kohn–Sham energy gap, and the mean equilibrium bond distance. The energy-gap distribution is skewed toward metallic systems (median $0.00$ eV, IQR $0.00$–$0.36$ eV), while bond distances cluster near $2$–$3$ Å (median $2.48$ Å, IQR $2.16$–$2.91$ Å). The left column displays channels $0$–$4$ and the right column channels $5$–$9$; badges report per-channel mean $(\mu)$ and standard deviation $(\sigma)$ over all $2{,}850$ systems.
  • Figure 2: Comprehensive overview of the QuantumCanvas diatomic corpus. Panels (a)--(c) summarize the global distributions of energy gap, total energy, and bond length (log-scaled in (a)) while reporting sample coverage. Panel (d) provides a categorical breakdown of bond-length regimes, and panels (e)--(g) chart structure--property relationships for energy gap, total energy, and dipole magnitude. Panel (h) visualizes the distribution of energy gaps across elemental groups, whereas panels (i)--(k) present element-resolved “constellation’’ views of average gap, bond length, and dipole magnitude. Panel (l) stratifies dipole magnitudes into physically meaningful bins. The bottom row consolidates aggregate statistics: panel (m) displays the property correlation matrix, and panels (n)--(p) highlight the average energy gap, average bond length, and sample counts for each elemental group. Together, these subfigures illustrate the breadth, balance, and cross-property variability captured in QuantumCanvas, underscoring its value for benchmarking data-driven quantum chemistry models.
  • Figure 3: Panels (a–c) summarize the construction, organization, and downstream use of the two-body interaction tokens. (a) Representative two-body image tokens for selected element pairs; each inset is bordered by the color assigned to that pair type. (b) Two-dimensional PCA projection of all token embeddings, colored by the periodic-table group of one constituent element, showing that chemically related pairs cluster in the embedding space. (c) Assembly of two-body tokens into full molecular and crystalline structures for selected examples from QM9, MD17, and CrysMTM. For each bond, the corresponding token thumbnail is placed at the bond midpoint, illustrating how pairwise interactions compose into larger systems. Atoms are drawn with simplified CPK-like colors: carbon (dark gray), hydrogen (white), oxygen (red), nitrogen (blue), sulfur (yellow), and titanium (teal). Reported mean absolute errors show the improvement obtained when models are initialized from the two-body representation rather than trained from scratch.