Table of Contents
Fetching ...

GeSS: Benchmarking Geometric Deep Learning under Scientific Applications with Distribution Shifts

Deyu Zou, Shikun Liu, Siqi Miao, Victor Fung, Shiyu Chang, Pan Li

TL;DR

GeSS, a comprehensive benchmark designed for evaluating the performance of GDL models in scientific scenarios with distribution shifts, is proposed, poised to illuminate insights for GDL researchers and domain practitioners who are to use GDL in their applications.

Abstract

Geometric deep learning (GDL) has gained significant attention in scientific fields, for its proficiency in modeling data with intricate geometric structures. However, very few works have delved into its capability of tackling the distribution shift problem, a prevalent challenge in many applications. To bridge this gap, we propose GeSS, a comprehensive benchmark designed for evaluating the performance of GDL models in scientific scenarios with distribution shifts. Our evaluation datasets cover diverse scientific domains from particle physics, materials science to biochemistry, and encapsulate a broad spectrum of distribution shifts including conditional, covariate, and concept shifts. Furthermore, we study three levels of information access from the out-of-distribution (OOD) test data, including no OOD information, only unlabeled OOD data, and OOD data with a few labels. Overall, our benchmark results in 30 different experiment settings, and evaluates 3 GDL backbones and 11 learning algorithms in each setting. A thorough analysis of the evaluation results is provided, poised to illuminate insights for GDL researchers and domain practitioners who are to use GDL in their applications.

GeSS: Benchmarking Geometric Deep Learning under Scientific Applications with Distribution Shifts

TL;DR

GeSS, a comprehensive benchmark designed for evaluating the performance of GDL models in scientific scenarios with distribution shifts, is proposed, poised to illuminate insights for GDL researchers and domain practitioners who are to use GDL in their applications.

Abstract

Geometric deep learning (GDL) has gained significant attention in scientific fields, for its proficiency in modeling data with intricate geometric structures. However, very few works have delved into its capability of tackling the distribution shift problem, a prevalent challenge in many applications. To bridge this gap, we propose GeSS, a comprehensive benchmark designed for evaluating the performance of GDL models in scientific scenarios with distribution shifts. Our evaluation datasets cover diverse scientific domains from particle physics, materials science to biochemistry, and encapsulate a broad spectrum of distribution shifts including conditional, covariate, and concept shifts. Furthermore, we study three levels of information access from the out-of-distribution (OOD) test data, including no OOD information, only unlabeled OOD data, and OOD data with a few labels. Overall, our benchmark results in 30 different experiment settings, and evaluates 3 GDL backbones and 11 learning algorithms in each setting. A thorough analysis of the evaluation results is provided, poised to illuminate insights for GDL researchers and domain practitioners who are to use GDL in their applications.
Paper Structure (34 sections, 1 equation, 6 figures, 12 tables)

This paper contains 34 sections, 1 equation, 6 figures, 12 tables.

Figures (6)

  • Figure 1: Overview of distribution shifts in this study. The upper (green-colored) and lower (blue-colored) instances represent the scenarios in domains $\cal S$ and $\cal T$, respectively. (a) Three-dimensional trajectories of particles in a collision event, which are simulated with a magnetic field parallel to the $z$ axis and plotted on a 2D plane; (b) For the same set of MOFs, the distribution of calculated band gap values exhibits a bimodal (unimodal) nature with lower (higher) expectations under PBE (HSE06) estimation; (c) Molecular three-dimensional stick models with different scaffold IDs across $\cal S$ and $\cal T$.
  • Figure 2: Test-OOD improvements (%) over ERM for $\rm TL_{100}$ and $\rm TL_{1000}$ across different shift cases (in the EGNN backbone).
  • Figure 3: (a) Test-OOD improvements (%) over ERM for VREx, DeepCoral, $\rm TL_{100}$ and $\rm TL_{1000}$ methods in Fidelity Shifts (including HSE06 and HSE06* cases) in the EGNN backbone; (b)/(c) KDE rosenblatt1956remarksparzen1962estimation curves of the marginal label distribution $\mathbb{P}(Y)$ across the source $\cal S$ and target $\cal T$ in the cases of HSE06 / HSE06*.
  • Figure 4: Test-OOD improvements (%) over ERM for GroupDRO, MixUp, LRI, and DANN methods across Pileup Shifts (cases of PU50/90) and Signal Shifts (cases of $\tau\to3\mu$ and $z'_{10}\to2\mu$) in the EGNN backbone.
  • Figure 5: SCMs of covariate, concept, $\cal I$-conditional, and $\cal C$-conditional shifts.
  • ...and 1 more figures