Beyond Atoms: Evaluating Electron Density Representation for 3D Molecular Learning
Patricia Suriana, Joshua A. Rackers, Ewa M. Nowara, Pedro O. Pinheiro, John M. Nicoloudis, Vishnu Sresht
TL;DR
This study benchmarks voxel-based representations for 3D molecular learning, comparing atom-type encodings against direct electron-density and its gradient, plus a Shape-Only baseline. Across two tasks—PDBbind binding affinity and QM9 quantum properties—it shows density-based inputs yield data-efficient gains in low-data regimes for binding and superior accuracy at scale for quantum properties, though results depend on the specific task and data quality. The work highlights that the optimal representation is task- and regime-dependent, with density-based inputs capturing physical electronic structure information that atom-centric schemes may miss. It also discusses practical considerations, such as the use of experimental densities versus approximated densities and the computational cost of voxel representations, pointing toward hybrid or adaptive approaches for broader applicability.
Abstract
Machine learning models for 3D molecular property prediction typically rely on atom-based representations, which may overlook subtle physical information. Electron density maps -- the direct output of X-ray crystallography and cryo-electron microscopy -- offer a continuous, physically grounded alternative. We compare three voxel-based input types for 3D convolutional neural networks (CNNs): atom types, raw electron density, and density gradient magnitude, across two molecular tasks -- protein-ligand binding affinity prediction (PDBbind) and quantum property prediction (QM9). We focus on voxel-based CNNs because electron density is inherently volumetric, and voxel grids provide the most natural representation for both experimental and computed densities. On PDBbind, all representations perform similarly with full data, but in low-data regimes, density-based inputs outperform atom types, while a shape-based baseline performs comparably -- suggesting that spatial occupancy dominates this task. On QM9, where labels are derived from Density Functional Theory (DFT) but input densities from a lower-level method (XTB), density-based inputs still outperform atom-based ones at scale, reflecting the rich structural and electronic information encoded in density. Overall, these results highlight the task- and regime-dependent strengths of density-derived inputs, improving data efficiency in affinity prediction and accuracy in quantum property modeling.
