Optimal message passing for molecular prediction is simple, attentive and spatial
Alma C. Castaneda-Leautaud, Rommie E. Amaro
TL;DR
This work investigates minimalist, bidirectional message-passing neural networks for molecular property prediction, demonstrating that simpler architectures with edge-aware attention can achieve state-of-the-art results. By systematically abating self-nodes, introducing attention, and integrating 3D descriptors with 2D graphs, the authors show that dataset diversity modulates the need for additional components and that 2D representations augmented with carefully chosen 3D features can match fully 3D approaches while reducing computational cost by over 50%. Feature selection reveals buried volume and radius of gyration as consistently informative 3D-aware features, while traditional element-like features often hurt performance due to distributional biases. The ABMP model, combining bidirectional passing and edge-aware attention, delivers the strongest performance across multiple MoleculeNet benchmarks, highlighting practical implications for fast, scalable drug discovery workflows. Overall, the study provides a principled, low-complexity pathway to high-performance molecular prediction with actionable guidance on feature engineering and model design.
Abstract
Strategies to improve the predicting performance of Message-Passing Neural-Networks for molecular property predictions can be achieved by simplifying how the message is passed and by using descriptors that capture multiple aspects of molecular graphs. In this work, we designed model architectures that achieved state-of-the-art performance, surpassing more complex models such as those pre-trained on external databases. We assessed dataset diversity to complement our performance results, finding that structural diversity influences the need for additional components in our MPNNs and feature sets. In most datasets, our best architecture employs bidirectional message-passing with an attention mechanism, applied to a minimalist message formulation that excludes self-perception, highlighting that relatively simpler models, compared to classical MPNNs, yield higher class separability. In contrast, we found that convolution normalization factors do not benefit the predictive power in all the datasets tested. This was corroborated in both global and node-level outputs. Additionally, we analyzed the influence of both adding spatial features and working with 3D graphs, finding that 2D molecular graphs are sufficient when complemented with appropriately chosen 3D descriptors. This approach not only preserves predictive performance but also reduces computational cost by over 50%, making it particularly advantageous for high-throughput screening campaigns.
