Molecule Attention Transformer
Łukasz Maziarka, Tomasz Danel, Sławomir Mucha, Krzysztof Rataj, Jacek Tabor, Stanisław Jastrzębski
TL;DR
Molecule Attention Transformer (MAT) extends the Transformer encoder by integrating molecular graph structure and inter-atomic distances into the self-attention mechanism, yielding a versatile model for diverse molecular property prediction tasks. MAT demonstrates competitive performance across a wide benchmark and, with simple node-level self-supervised pretraining, achieves state-of-the-art results while drastically reducing hyperparameter tuning needs. The approach provides chemically interpretable attention heads and shows robust transfer when pretrained on large molecular corpora. The work highlights a practical path toward easier-to-use, data-efficient deep learning for drug discovery and material design.
Abstract
Designing a single neural network architecture that performs competitively across a range of molecule property prediction tasks remains largely an open challenge, and its solution may unlock a widespread use of deep learning in the drug discovery industry. To move towards this goal, we propose Molecule Attention Transformer (MAT). Our key innovation is to augment the attention mechanism in Transformer using inter-atomic distances and the molecular graph structure. Experiments show that MAT performs competitively on a diverse set of molecular prediction tasks. Most importantly, with a simple self-supervised pretraining, MAT requires tuning of only a few hyperparameter values to achieve state-of-the-art performance on downstream tasks. Finally, we show that attention weights learned by MAT are interpretable from the chemical point of view.
