Table of Contents
Fetching ...

Electron-Informed Coarse-Graining Molecular Representation Learning for Real-World Molecular Physics

Gyoung S. Na, Chanyoung Park

TL;DR

The paper tackles the gap between atom-level molecular representations and real-world molecular physics by introducing HEDMoL, which leverages electron-density information without expensive quantum calculations. It moves from full electronic structure reliance to a substructure-based transfer of electron-level cues, using junction-tree decomposition, a GeoScattering-based knowledge extension, and hierarchical representation learning to fuse atom- and electron-level information. Energy-based physical consistency regularization ties the different representations to consistent energetics, and experiments on eight real-world datasets show state-of-the-art performance, including strong results with small training data. The approach offers practical impact for real-world chemistry by enabling electron-informed predictions at scale, and code is publicly available for reproducibility.

Abstract

Various representation learning methods for molecular structures have been devised to accelerate data-driven chemistry. However, the representation capabilities of existing methods are essentially limited to atom-level information, which is not sufficient to describe real-world molecular physics. Although electron-level information can provide fundamental knowledge about chemical compounds beyond the atom-level information, obtaining the electron-level information in real-world molecules is computationally impractical and sometimes infeasible. We propose a method for learning electron-informed molecular representations without additional computation costs by transferring readily accessible electron-level information about small molecules to large molecules of our interest. The proposed method achieved state-of-the-art prediction accuracy on extensive benchmark datasets containing experimentally observed molecular physics. The source code for HEDMoL is available at https://github.com/ngs00/HEDMoL.

Electron-Informed Coarse-Graining Molecular Representation Learning for Real-World Molecular Physics

TL;DR

The paper tackles the gap between atom-level molecular representations and real-world molecular physics by introducing HEDMoL, which leverages electron-density information without expensive quantum calculations. It moves from full electronic structure reliance to a substructure-based transfer of electron-level cues, using junction-tree decomposition, a GeoScattering-based knowledge extension, and hierarchical representation learning to fuse atom- and electron-level information. Energy-based physical consistency regularization ties the different representations to consistent energetics, and experiments on eight real-world datasets show state-of-the-art performance, including strong results with small training data. The approach offers practical impact for real-world chemistry by enabling electron-informed predictions at scale, and code is publicly available for reproducibility.

Abstract

Various representation learning methods for molecular structures have been devised to accelerate data-driven chemistry. However, the representation capabilities of existing methods are essentially limited to atom-level information, which is not sufficient to describe real-world molecular physics. Although electron-level information can provide fundamental knowledge about chemical compounds beyond the atom-level information, obtaining the electron-level information in real-world molecules is computationally impractical and sometimes infeasible. We propose a method for learning electron-informed molecular representations without additional computation costs by transferring readily accessible electron-level information about small molecules to large molecules of our interest. The proposed method achieved state-of-the-art prediction accuracy on extensive benchmark datasets containing experimentally observed molecular physics. The source code for HEDMoL is available at https://github.com/ngs00/HEDMoL.
Paper Structure (24 sections, 13 equations, 5 figures, 9 tables)

This paper contains 24 sections, 13 equations, 5 figures, 9 tables.

Figures (5)

  • Figure 1: Basic assumptions of existing GNN-based methods in molecular representation learning, and their prediction processes on atom-level molecular structures.
  • Figure 2: The overall representation learning and prediction processes of HEDMoL to predict the target molecular property $y$ of the input atom-level molecular structure $\mathcal{A}$. In the exemplary input molecule, $\mathcal{R} = \{S_1, S_2, S_3\}$ is a set of the decomposed atom-level substructures, and $\mathcal{U}_e$ is a set of edges between $\{S_1, S_2, S_3\}$.
  • Figure 3: $R^2$-scores for different numbers of the training data. The black dotted line indicates the $R^2$-score of the best competitor method on the 80% training data.
  • Figure 4: $R^2$-scores of HEDMoL on the Lipop, ESOL, ADMET, and IGC50 datasets for different values of $\alpha$
  • Figure 5: $R^2$-scores of HEDMoL on the Lipop, ESOL, ADMET, and IGC50 datasets for different values of $\lambda$