Table of Contents
Fetching ...

MoleculeNet: A Benchmark for Molecular Machine Learning

Zhenqin Wu, Bharath Ramsundar, Evan N. Feinberg, Joseph Gomes, Caleb Geniesse, Aneesh S. Pappu, Karl Leswing, Vijay Pande

TL;DR

MoleculeNet provides a unified, open benchmark for molecular machine learning by compiling 17 diverse datasets across quantum, physical, biophysical, and physiological properties, and implementing standardized splits, metrics, and varied featurizations in DeepChem. The study demonstrates that graph-based, learnable representations generally outperform conventional methods on many tasks, particularly with sufficient data, while noting caveats for data-scarce or highly imbalanced problems where physics-aware features or alternative models may prevail. By enabling modular comparisons of featurization, models, and splits, MoleculeNet exposes domain-specific insights—distance-aware representations excel for quantum/mechanical tasks, whereas scaffold/time splits probe generalization more stringently than random splits. The benchmark aims to catalyze rapid methodological progress in molecular ML and to serve as a lasting resource for researchers, with ongoing expansion and community contributions encouraged via the DeepChem ecosystem.

Abstract

Molecular machine learning has been maturing rapidly over the last few years. Improved methods and the presence of larger datasets have enabled machine learning algorithms to make increasingly accurate predictions about molecular properties. However, algorithmic progress has been limited due to the lack of a standard benchmark to compare the efficacy of proposed methods; most new algorithms are benchmarked on different datasets making it challenging to gauge the quality of proposed methods. This work introduces MoleculeNet, a large scale benchmark for molecular machine learning. MoleculeNet curates multiple public datasets, establishes metrics for evaluation, and offers high quality open-source implementations of multiple previously proposed molecular featurization and learning algorithms (released as part of the DeepChem open source library). MoleculeNet benchmarks demonstrate that learnable representations are powerful tools for molecular machine learning and broadly offer the best performance. However, this result comes with caveats. Learnable representations still struggle to deal with complex tasks under data scarcity and highly imbalanced classification. For quantum mechanical and biophysical datasets, the use of physics-aware featurizations can be more important than choice of particular learning algorithm.

MoleculeNet: A Benchmark for Molecular Machine Learning

TL;DR

MoleculeNet provides a unified, open benchmark for molecular machine learning by compiling 17 diverse datasets across quantum, physical, biophysical, and physiological properties, and implementing standardized splits, metrics, and varied featurizations in DeepChem. The study demonstrates that graph-based, learnable representations generally outperform conventional methods on many tasks, particularly with sufficient data, while noting caveats for data-scarce or highly imbalanced problems where physics-aware features or alternative models may prevail. By enabling modular comparisons of featurization, models, and splits, MoleculeNet exposes domain-specific insights—distance-aware representations excel for quantum/mechanical tasks, whereas scaffold/time splits probe generalization more stringently than random splits. The benchmark aims to catalyze rapid methodological progress in molecular ML and to serve as a lasting resource for researchers, with ongoing expansion and community contributions encouraged via the DeepChem ecosystem.

Abstract

Molecular machine learning has been maturing rapidly over the last few years. Improved methods and the presence of larger datasets have enabled machine learning algorithms to make increasingly accurate predictions about molecular properties. However, algorithmic progress has been limited due to the lack of a standard benchmark to compare the efficacy of proposed methods; most new algorithms are benchmarked on different datasets making it challenging to gauge the quality of proposed methods. This work introduces MoleculeNet, a large scale benchmark for molecular machine learning. MoleculeNet curates multiple public datasets, establishes metrics for evaluation, and offers high quality open-source implementations of multiple previously proposed molecular featurization and learning algorithms (released as part of the DeepChem open source library). MoleculeNet benchmarks demonstrate that learnable representations are powerful tools for molecular machine learning and broadly offer the best performance. However, this result comes with caveats. Learnable representations still struggle to deal with complex tasks under data scarcity and highly imbalanced classification. For quantum mechanical and biophysical datasets, the use of physics-aware featurizations can be more important than choice of particular learning algorithm.

Paper Structure

This paper contains 78 sections, 4 equations, 15 figures, 13 tables.

Figures (15)

  • Figure 1: Example code for benchmark evaluation with DeepChem, multiple methods are provided for data splitting, featurization and learning.
  • Figure 2: Tasks in different datasets focus on different levels of properties of molecules.
  • Figure 3: Representation of Data Splits in MoleculeNet.
  • Figure 4: Receiver operating characteristic (ROC) curves and precision recall curves (PRC) for predictions of logistic regression and graph convolutional models under different class imbalance condition.(Details listed in Table \ref{['tab:ROCvsPRC_AUC']}): A, B: task "FDA_APPROVED" from ClinTox, test subset; C, D: task "Hepatobiliary disorders" from SIDER, test subset; E, F: task "NR-ER" from Tox21, validation subset; G, H: task "HIV_active" from HIV, test subset. Black dashed lines are performances of random classifiers.
  • Figure 5: Diagrams of featurizations in MoleculeNet.
  • ...and 10 more figures