QuantumCanvas: A Multimodal Benchmark for Visual Learning of Atomic Interactions

Can Polat; Erchin Serpedin; Mustafa Kurban; Hasan Kurban

QuantumCanvas: A Multimodal Benchmark for Visual Learning of Atomic Interactions

Can Polat, Erchin Serpedin, Mustafa Kurban, Hasan Kurban

TL;DR

QuantumCanvas reframes atomic interactions by treating two-body quantum systems as fundamental units and pairing scalar quantum descriptors with interpretable, coordinate-free orbital images. The dataset covers 2,850 diatomics across 75 elements, with 18 targets and 10-channel image representations derived from orbital populations, enabling multimodal learning that couples quantum physics with vision-based features. Benchmark results across eight architectures and 18 targets reveal modality-specific strengths and show that two-body pretraining improves convergence and generalization to molecular and crystalline domains. This work provides a principled foundation for transferable quantum learning and opens avenues to extend multimodal representations to triatomic and time-dependent regimes.

Abstract

Despite rapid advances in molecular and materials machine learning, most models still lack physical transferability: they fit correlations across whole molecules or crystals rather than learning the quantum interactions between atomic pairs. Yet bonding, charge redistribution, orbital hybridization, and electronic coupling all emerge from these two-body interactions that define local quantum fields in many-body systems. We introduce QuantumCanvas, a large-scale multimodal benchmark that treats two-body quantum systems as foundational units of matter. The dataset spans 2,850 element-element pairs, each annotated with 18 electronic, thermodynamic, and geometric properties and paired with ten-channel image representations derived from l- and m-resolved orbital densities, angular field transforms, co-occupancy maps, and charge-density projections. These physically grounded images encode spatial, angular, and electrostatic symmetries without explicit coordinates, providing an interpretable visual modality for quantum learning. Benchmarking eight architectures across 18 targets, we report mean absolute errors of 0.201 eV on energy gap using GATv2, 0.265 eV on HOMO and 0.274 eV on LUMO using EGNN. For energy-related quantities, DimeNet attains 2.27 eV total-energy MAE and 0.132 eV repulsive-energy MAE, while a multimodal fusion model achieves a 2.15 eV Mermin free-energy MAE. Pretraining on QuantumCanvas further improves convergence stability and generalization when fine-tuned on larger datasets such as QM9, MD17, and CrysMTM. By unifying orbital physics with vision-based representation learning, QuantumCanvas provides a principled and interpretable basis for learning transferable quantum interactions through coupled visual and numerical modalities. Dataset and model implementations are available at https://github.com/KurbanIntelligenceLab/QuantumCanvas.

QuantumCanvas: A Multimodal Benchmark for Visual Learning of Atomic Interactions

TL;DR

Abstract

QuantumCanvas: A Multimodal Benchmark for Visual Learning of Atomic Interactions

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (3)