Electron-Informed Coarse-Graining Molecular Representation Learning for Real-World Molecular Physics
Gyoung S. Na, Chanyoung Park
TL;DR
The paper tackles the gap between atom-level molecular representations and real-world molecular physics by introducing HEDMoL, which leverages electron-density information without expensive quantum calculations. It moves from full electronic structure reliance to a substructure-based transfer of electron-level cues, using junction-tree decomposition, a GeoScattering-based knowledge extension, and hierarchical representation learning to fuse atom- and electron-level information. Energy-based physical consistency regularization ties the different representations to consistent energetics, and experiments on eight real-world datasets show state-of-the-art performance, including strong results with small training data. The approach offers practical impact for real-world chemistry by enabling electron-informed predictions at scale, and code is publicly available for reproducibility.
Abstract
Various representation learning methods for molecular structures have been devised to accelerate data-driven chemistry. However, the representation capabilities of existing methods are essentially limited to atom-level information, which is not sufficient to describe real-world molecular physics. Although electron-level information can provide fundamental knowledge about chemical compounds beyond the atom-level information, obtaining the electron-level information in real-world molecules is computationally impractical and sometimes infeasible. We propose a method for learning electron-informed molecular representations without additional computation costs by transferring readily accessible electron-level information about small molecules to large molecules of our interest. The proposed method achieved state-of-the-art prediction accuracy on extensive benchmark datasets containing experimentally observed molecular physics. The source code for HEDMoL is available at https://github.com/ngs00/HEDMoL.
