Table of Contents
Fetching ...

In-Context Learning of Physical Properties: Few-Shot Adaptation to Out-of-Distribution Molecular Graphs

Grzegorz Kaszuba, Amirhossein D. Naghdi, Dario Massa, Stefanos Papanikolaou, Andrzej Jaszkiewicz, Piotr Sankowski

TL;DR

The paper addresses the challenge of predicting out-of-distribution molecular properties, specifically the absolute-zero atomization energy $U_0$, by integrating geometry-aware graph representations with in-context learning. It introduces a compound model where MXMNet encodes molecular graphs and a GPT-2 transformer performs in-context regression over sequences of structure-label pairs, using substructure-based prompts derived from QM9. The authors construct an OOD benchmark by partitioning QM9 into base and OOD subsets (Esters and Oximes) via graph mining, and demonstrate that the in-context framework substantially improves OOD predictions beyond standard GNN baselines, with ablations revealing dataset-dependent benefits of GPT-2 versus linear readouts. This approach offers a data-efficient pathway for material discovery by leveraging contextual information to generalize to novel molecular structures, and provides a scalable benchmarking framework for in-context learning in molecular modeling.

Abstract

Large language models manifest the ability of few-shot adaptation to a sequence of provided examples. This behavior, known as in-context learning, allows for performing nontrivial machine learning tasks during inference only. In this work, we address the question: can we leverage in-context learning to predict out-of-distribution materials properties? However, this would not be possible for structure property prediction tasks unless an effective method is found to pass atomic-level geometric features to the transformer model. To address this problem, we employ a compound model in which GPT-2 acts on the output of geometry-aware graph neural networks to adapt in-context information. To demonstrate our model's capabilities, we partition the QM9 dataset into sequences of molecules that share a common substructure and use them for in-context learning. This approach significantly improves the performance of the model on out-of-distribution examples, surpassing the one of general graph neural network models.

In-Context Learning of Physical Properties: Few-Shot Adaptation to Out-of-Distribution Molecular Graphs

TL;DR

The paper addresses the challenge of predicting out-of-distribution molecular properties, specifically the absolute-zero atomization energy , by integrating geometry-aware graph representations with in-context learning. It introduces a compound model where MXMNet encodes molecular graphs and a GPT-2 transformer performs in-context regression over sequences of structure-label pairs, using substructure-based prompts derived from QM9. The authors construct an OOD benchmark by partitioning QM9 into base and OOD subsets (Esters and Oximes) via graph mining, and demonstrate that the in-context framework substantially improves OOD predictions beyond standard GNN baselines, with ablations revealing dataset-dependent benefits of GPT-2 versus linear readouts. This approach offers a data-efficient pathway for material discovery by leveraging contextual information to generalize to novel molecular structures, and provides a scalable benchmarking framework for in-context learning in molecular modeling.

Abstract

Large language models manifest the ability of few-shot adaptation to a sequence of provided examples. This behavior, known as in-context learning, allows for performing nontrivial machine learning tasks during inference only. In this work, we address the question: can we leverage in-context learning to predict out-of-distribution materials properties? However, this would not be possible for structure property prediction tasks unless an effective method is found to pass atomic-level geometric features to the transformer model. To address this problem, we employ a compound model in which GPT-2 acts on the output of geometry-aware graph neural networks to adapt in-context information. To demonstrate our model's capabilities, we partition the QM9 dataset into sequences of molecules that share a common substructure and use them for in-context learning. This approach significantly improves the performance of the model on out-of-distribution examples, surpassing the one of general graph neural network models.
Paper Structure (21 sections, 4 equations, 4 figures, 3 tables)

This paper contains 21 sections, 4 equations, 4 figures, 3 tables.

Figures (4)

  • Figure 1: Schematic of partitioned dataset. Each prompt $p_{i} \in \mathcal{P}$ is created by dividing the dataset into molecule sequences that share a common substructure. The test data is chosen to ensure it has no overlap with the training set distribution.
  • Figure 2: OOD Ester (left) and oxime (right) groups present in evaluation examples. If either of these substructures is present in a molecular graph, it is removed from the training set (QM9-base) and passed to the appropriate evaluation set: QM9-OOD-Ester or QM9-OOD-Oxime. Along with oximes, all other structures with nitrogen-oxygen bond are chosen.
  • Figure 3: Schematic representation of MXMNet's readout operation. We use pooled representations that describe the entire graphs, effectively replacing the linear layers shown.
  • Figure 4: Visualization of the in-context learning pipeline, along with two ablations considered. In green: the module trained in advance. In blue: modules fitted during in-context training. In white: a linear regression - the last readout scheme involves no deep-learning training.