Table of Contents
Fetching ...

On the Scalability of GNNs for Molecular Graphs

Maciej Sypetkowski, Frederik Wenkel, Farimah Poursafaei, Nia Dickson, Karush Suri, Philip Fradkin, Dominique Beaini

TL;DR

This work analyzes message-passing networks, graph Transformers, and hybrid architectures on the largest public collection of 2D molecular graphs and observes that GNNs benefit tremendously from the increasing scale of depth, width, number of molecules, number of labels, and the diversity in the pretraining datasets.

Abstract

Scaling deep learning models has been at the heart of recent revolutions in language modelling and image generation. Practitioners have observed a strong relationship between model size, dataset size, and performance. However, structure-based architectures such as Graph Neural Networks (GNNs) are yet to show the benefits of scale mainly due to the lower efficiency of sparse operations, large data requirements, and lack of clarity about the effectiveness of various architectures. We address this drawback of GNNs by studying their scaling behavior. Specifically, we analyze message-passing networks, graph Transformers, and hybrid architectures on the largest public collection of 2D molecular graphs. For the first time, we observe that GNNs benefit tremendously from the increasing scale of depth, width, number of molecules, number of labels, and the diversity in the pretraining datasets. We further demonstrate strong finetuning scaling behavior on 38 highly competitive downstream tasks, outclassing previous large models. This gives rise to MolGPS, a new graph foundation model that allows to navigate the chemical space, outperforming the previous state-of-the-arts on 26 out the 38 downstream tasks. We hope that our work paves the way for an era where foundational GNNs drive pharmaceutical drug discovery.

On the Scalability of GNNs for Molecular Graphs

TL;DR

This work analyzes message-passing networks, graph Transformers, and hybrid architectures on the largest public collection of 2D molecular graphs and observes that GNNs benefit tremendously from the increasing scale of depth, width, number of molecules, number of labels, and the diversity in the pretraining datasets.

Abstract

Scaling deep learning models has been at the heart of recent revolutions in language modelling and image generation. Practitioners have observed a strong relationship between model size, dataset size, and performance. However, structure-based architectures such as Graph Neural Networks (GNNs) are yet to show the benefits of scale mainly due to the lower efficiency of sparse operations, large data requirements, and lack of clarity about the effectiveness of various architectures. We address this drawback of GNNs by studying their scaling behavior. Specifically, we analyze message-passing networks, graph Transformers, and hybrid architectures on the largest public collection of 2D molecular graphs. For the first time, we observe that GNNs benefit tremendously from the increasing scale of depth, width, number of molecules, number of labels, and the diversity in the pretraining datasets. We further demonstrate strong finetuning scaling behavior on 38 highly competitive downstream tasks, outclassing previous large models. This gives rise to MolGPS, a new graph foundation model that allows to navigate the chemical space, outperforming the previous state-of-the-arts on 26 out the 38 downstream tasks. We hope that our work paves the way for an era where foundational GNNs drive pharmaceutical drug discovery.
Paper Structure (33 sections, 4 equations, 18 figures, 4 tables)

This paper contains 33 sections, 4 equations, 18 figures, 4 tables.

Figures (18)

  • Figure 1: Summary of our GNN scaling hypotheses studied in the present work. The baseline model is presented in dark grey, followed by different scaling hypotheses illustrated in lighter colors. We analyze the scaling behavior of message-passing networks, graph Transformers and hybrid architectures with respect to the increasing scale of width, depth, number of molecules, number of labels, and diversity of datasets.
  • Figure 2: Effect of scaling different scaling types (columns) to test performance (rows). The standardized mean is calculated as mean of standardized scores for every task in a dataset group, i.e., a mean and standard deviation per task were calculated based on all our models in this study (signs of tasks with lower is better metrics were flipped).
  • Figure 3: Finetuning and probing performance of pretrained MPNN++ models of different width on the Polaris benchmark. Darker green shades denote better metric values. Larger models tend to perform better on unseen tasks. Spearman correlation values closer to 1 indicate that predictive performance correlates with larger model sizes.
  • Figure 4: Comparison of our MolGPS foundation model (that combines fingerprints from the MPNN++, Transformer and hybrid GPS++ model) to the SOTA across TDC, Polaris, and MoleculeNet benchmarks. SOTA refers to the maximum value for each dataset. MolGPS establishes new SOTA on $11/22$ TDC tasks and on all but one task among Polaris and MoleculeNet.
  • Figure 5: Comparison of our MPNN++ probing (that leverages multiple fingerprints; with and without additional phenomics pretraining) and MolGPS (that leverages fingerprints from MPNN++, Transformer and GPS++) to various baselines across TDC benchmark collection using an aggregated metric.
  • ...and 13 more figures