In-Context Learning of Physical Properties: Few-Shot Adaptation to Out-of-Distribution Molecular Graphs
Grzegorz Kaszuba, Amirhossein D. Naghdi, Dario Massa, Stefanos Papanikolaou, Andrzej Jaszkiewicz, Piotr Sankowski
TL;DR
The paper addresses the challenge of predicting out-of-distribution molecular properties, specifically the absolute-zero atomization energy $U_0$, by integrating geometry-aware graph representations with in-context learning. It introduces a compound model where MXMNet encodes molecular graphs and a GPT-2 transformer performs in-context regression over sequences of structure-label pairs, using substructure-based prompts derived from QM9. The authors construct an OOD benchmark by partitioning QM9 into base and OOD subsets (Esters and Oximes) via graph mining, and demonstrate that the in-context framework substantially improves OOD predictions beyond standard GNN baselines, with ablations revealing dataset-dependent benefits of GPT-2 versus linear readouts. This approach offers a data-efficient pathway for material discovery by leveraging contextual information to generalize to novel molecular structures, and provides a scalable benchmarking framework for in-context learning in molecular modeling.
Abstract
Large language models manifest the ability of few-shot adaptation to a sequence of provided examples. This behavior, known as in-context learning, allows for performing nontrivial machine learning tasks during inference only. In this work, we address the question: can we leverage in-context learning to predict out-of-distribution materials properties? However, this would not be possible for structure property prediction tasks unless an effective method is found to pass atomic-level geometric features to the transformer model. To address this problem, we employ a compound model in which GPT-2 acts on the output of geometry-aware graph neural networks to adapt in-context information. To demonstrate our model's capabilities, we partition the QM9 dataset into sequences of molecules that share a common substructure and use them for in-context learning. This approach significantly improves the performance of the model on out-of-distribution examples, surpassing the one of general graph neural network models.
