Benchmarking GNNs for OOD Materials Property Prediction with Uncertainty Quantification

Liqin Tan; Pin Chen; Menghan Liu; Xiean Wang; Jianhuan Cen; Qingsong Zou

Benchmarking GNNs for OOD Materials Property Prediction with Uncertainty Quantification

Liqin Tan, Pin Chen, Menghan Liu, Xiean Wang, Jianhuan Cen, Qingsong Zou

TL;DR

This work introduces MatUQ, a comprehensive benchmark for evaluating GNNs on out-of-distribution materials property prediction with uncertainty quantification. It combines a structure-aware SOAP-LOCO OOD split, uncertainty-aware training using Monte Carlo Dropout and Deep Evidential Regression ($DER$), and a new uncertainty metric ($D$-EviU) to jointly assess accuracy and uncertainty. Across 1,375 OOD tasks from six datasets, MatUQ reveals that no single model dominates all tasks, though structure-aware transformers and angular-feature models often excel in specific properties, while uncertainty-aware training yields substantial MAE reductions (up to $70.6\%$ on D1–D3 and $84.5\%$ on D3). The framework provides practical guidance for selecting reliable models under distribution shifts in materials discovery and supports future developments in uncertainty calibration and OOD evaluation.

Abstract

We present MatUQ, a benchmark framework for evaluating graph neural networks (GNNs) on out-of-distribution (OOD) materials property prediction with uncertainty quantification (UQ). MatUQ comprises 1,375 OOD prediction tasks constructed from six materials datasets using five OFM-based and a newly proposed structure-aware splitting strategy, SOAP-LOCO, which captures local atomic environments more effectively. We evaluate 12 representative GNN models under a unified uncertainty-aware training protocol that combines Monte Carlo Dropout and Deep Evidential Regression (DER), and introduce a novel uncertainty metric, D-EviU, which shows the strongest correlation with prediction errors in most tasks. Our experiments yield two key findings. First, the uncertainty-aware training approach significantly improves model prediction accuracy, reducing errors by an average of 70.6\% across challenging OOD scenarios. Second, the benchmark reveals that no single model dominates universally: earlier models such as SchNet and ALIGNN remain competitive, while newer models like CrystalFramer and SODNet demonstrate superior performance on specific material properties. These results provide practical insights for selecting reliable models under distribution shifts in materials discovery.

Benchmarking GNNs for OOD Materials Property Prediction with Uncertainty Quantification

TL;DR

Abstract

Benchmarking GNNs for OOD Materials Property Prediction with Uncertainty Quantification

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (1)