Efficient Training of Transformers for Molecule Property Prediction on Small-scale Datasets

Shivesh Prakash

Efficient Training of Transformers for Molecule Property Prediction on Small-scale Datasets

Shivesh Prakash

TL;DR

The paper tackles BBB permeability prediction on small datasets by introducing a GPS Transformer augmented with Self Attention. By tailoring GPS blocks and opting for standard Attention over variants prone to overfitting, the approach achieves a ROC-AUC of $78.8\%$ on the BBBP dataset, outperforming prior methods by $5.5\%$. This demonstrates that effective transformer-based graph models can excel in low-data regimes for molecule property prediction, with practical implications for streamlined CNS drug discovery. The work emphasizes careful architectural choices and data handling (e.g., stratified sampling and augmentation) to maximize performance on limited data, offering a scalable route for BBB-related cheminformatics tasks.

Abstract

The blood-brain barrier (BBB) serves as a protective barrier that separates the brain from the circulatory system, regulating the passage of substances into the central nervous system. Assessing the BBB permeability of potential drugs is crucial for effective drug targeting. However, traditional experimental methods for measuring BBB permeability are challenging and impractical for large-scale screening. Consequently, there is a need to develop computational approaches to predict BBB permeability. This paper proposes a GPS Transformer architecture augmented with Self Attention, designed to perform well in the low-data regime. The proposed approach achieved a state-of-the-art performance on the BBB permeability prediction task using the BBBP dataset, surpassing existing models. With a ROC-AUC of 78.8%, the approach sets a state-of-the-art by 5.5%. We demonstrate that standard Self Attention coupled with GPS transformer performs better than other variants of attention coupled with GPS Transformer.

Efficient Training of Transformers for Molecule Property Prediction on Small-scale Datasets

TL;DR

on the BBBP dataset, outperforming prior methods by

. This demonstrates that effective transformer-based graph models can excel in low-data regimes for molecule property prediction, with practical implications for streamlined CNS drug discovery. The work emphasizes careful architectural choices and data handling (e.g., stratified sampling and augmentation) to maximize performance on limited data, offering a scalable route for BBB-related cheminformatics tasks.

Abstract

Paper Structure (15 sections, 4 equations, 1 figure, 1 table)

This paper contains 15 sections, 4 equations, 1 figure, 1 table.

Introduction
Related Work
Data
Methods
GPS Block
GPS++ Block
Our Architecture
Experimental Setup
Sampling
Model backbone
Baseline model
Loss function
Code
Results
Conclusion

Figures (1)

Figure 1: Overview of the proposed model, notice the architectural changes made from GPS++.

Efficient Training of Transformers for Molecule Property Prediction on Small-scale Datasets

TL;DR

Abstract

Efficient Training of Transformers for Molecule Property Prediction on Small-scale Datasets

Authors

TL;DR

Abstract

Table of Contents

Figures (1)