Table of Contents
Fetching ...

Pushing the Limits of All-Atom Geometric Graph Neural Networks: Pre-Training, Scaling and Zero-Shot Transfer

Zihan Pengmei, Zhengyuan Shen, Zichen Wang, Marcus Collins, Huzefa Rangwala

TL;DR

This work explored the possibility of using pre-trained Geom-GNNs as transferable and highly effective geometric descriptors for improved generalization and demonstrated how all-atom graph embedding can be organically combined with other neural architectures to enhance the expressive power.

Abstract

Constructing transferable descriptors for conformation representation of molecular and biological systems finds numerous applications in drug discovery, learning-based molecular dynamics, and protein mechanism analysis. Geometric graph neural networks (Geom-GNNs) with all-atom information have transformed atomistic simulations by serving as a general learnable geometric descriptors for downstream tasks including prediction of interatomic potential and molecular properties. However, common practices involve supervising Geom-GNNs on specific downstream tasks, which suffer from the lack of high-quality data and inaccurate labels leading to poor generalization and performance degradation on out-of-distribution (OOD) scenarios. In this work, we explored the possibility of using pre-trained Geom-GNNs as transferable and highly effective geometric descriptors for improved generalization. To explore their representation power, we studied the scaling behaviors of Geom-GNNs under self-supervised pre-training, supervised and unsupervised learning setups. We find that the expressive power of different architectures can differ on the pre-training task. Interestingly, Geom-GNNs do not follow the power-law scaling on the pre-training task, and universally lack predictable scaling behavior on the supervised tasks with quantum chemical labels important for screening and design of novel molecules. More importantly, we demonstrate how all-atom graph embedding can be organically combined with other neural architectures to enhance the expressive power. Meanwhile, the low-dimensional projection of the latent space shows excellent agreement with conventional geometrical descriptors.

Pushing the Limits of All-Atom Geometric Graph Neural Networks: Pre-Training, Scaling and Zero-Shot Transfer

TL;DR

This work explored the possibility of using pre-trained Geom-GNNs as transferable and highly effective geometric descriptors for improved generalization and demonstrated how all-atom graph embedding can be organically combined with other neural architectures to enhance the expressive power.

Abstract

Constructing transferable descriptors for conformation representation of molecular and biological systems finds numerous applications in drug discovery, learning-based molecular dynamics, and protein mechanism analysis. Geometric graph neural networks (Geom-GNNs) with all-atom information have transformed atomistic simulations by serving as a general learnable geometric descriptors for downstream tasks including prediction of interatomic potential and molecular properties. However, common practices involve supervising Geom-GNNs on specific downstream tasks, which suffer from the lack of high-quality data and inaccurate labels leading to poor generalization and performance degradation on out-of-distribution (OOD) scenarios. In this work, we explored the possibility of using pre-trained Geom-GNNs as transferable and highly effective geometric descriptors for improved generalization. To explore their representation power, we studied the scaling behaviors of Geom-GNNs under self-supervised pre-training, supervised and unsupervised learning setups. We find that the expressive power of different architectures can differ on the pre-training task. Interestingly, Geom-GNNs do not follow the power-law scaling on the pre-training task, and universally lack predictable scaling behavior on the supervised tasks with quantum chemical labels important for screening and design of novel molecules. More importantly, we demonstrate how all-atom graph embedding can be organically combined with other neural architectures to enhance the expressive power. Meanwhile, the low-dimensional projection of the latent space shows excellent agreement with conventional geometrical descriptors.

Paper Structure

This paper contains 22 sections, 12 equations, 20 figures, 7 tables.

Figures (20)

  • Figure 1: Meta-architecture for Using Pre-trained Geom-GNNs as descriptors: The figure shows a framework where pre-trained Geom-GNNs act as local geometric descriptors to featurize residue-level conformations. Each window represents an atomic environment for residue feature extraction, defined by a user-defined context (illustrated here as a sliding window of nearest neighbors in sequence). In each window, atomic structures are treated as individual graphs and processed by the pre-trained Geom-GNN to extract atomic-level features, which are aggregated into residue-level representations or "tokens." The architecture can employ self-attention (SA), multi-layer perceptron (MLP), or message passing mechanisms to enhance representational power. For graph-level tasks, the mixed tokens are pooled and input to a task-specific head for training and predictions.
  • Figure 2: Visualization of learned singular vectors of Koopman operator onto the $\Psi-\Phi$ dihedral angle space of ala2 using pre-trained ViSNet embedding with 64 width. Those singular vectors describe the slow modes of the underlying dynamics (Appendix \ref{['app:vamp']}).
  • Figure 3: Validation VAMP-2 scores using pre-trained ViSNet embeddings with 256 length (no token mixer) across various output dimensions and lag times for the pentapeptide system. Random half of available trajectories are held for validation and the remaining are used for training.
  • Figure 4: Scaling behavior of ViSNet depth for the converged results of training on PCQM and Denali datasets. The plot shows the relationship between the number of layers and the pre-training loss, illustrating the initial rapid improvement followed by diminishing returns as depth increases. We additionally show the results of the first epoch loss in Figure \ref{['fig:layer_scale_first_epoch']} (Appendix).
  • Figure 6: HOMO-LUMO gap of 'gdb1', 'gdb2' and 'gdb3' predictied by varying combinations of common basis sets and density functionals. Different combinations showed considerable variance.
  • ...and 15 more figures