Table of Contents
Fetching ...

Improving out-of-distribution generalization in graphs via hierarchical semantic environments

Yinhua Piao, Sangseon Lee, Yijingxiu Lu, Sun Kim

TL;DR

This work tackles graph out-of-distribution generalization by generating hierarchical semantic environments to capture complex distribution shifts in datasets like DrugOOD. The method combines hierarchical stochastic subgraph generation with hierarchical environment inference and two contrastive losses (EnvCon and LabelCon) to yield robust graph invariant learning, enforced via an overall objective $\mathcal{L}_{HEI}$. The approach emphasizes intra-hierarchy environment diversification and inter-hierarchy environment augmentation to model relationships among environments. Empirically, it achieves up to 1.29% ROC-AUC improvement on IC50 and 2.83% on EC50 (DrugOOD), outperforming state-of-the-art graph OOD baselines.

Abstract

Out-of-distribution (OOD) generalization in the graph domain is challenging due to complex distribution shifts and a lack of environmental contexts. Recent methods attempt to enhance graph OOD generalization by generating flat environments. However, such flat environments come with inherent limitations to capture more complex data distributions. Considering the DrugOOD dataset, which contains diverse training environments (e.g., scaffold, size, etc.), flat contexts cannot sufficiently address its high heterogeneity. Thus, a new challenge is posed to generate more semantically enriched environments to enhance graph invariant learning for handling distribution shifts. In this paper, we propose a novel approach to generate hierarchical semantic environments for each graph. Firstly, given an input graph, we explicitly extract variant subgraphs from the input graph to generate proxy predictions on local environments. Then, stochastic attention mechanisms are employed to re-extract the subgraphs for regenerating global environments in a hierarchical manner. In addition, we introduce a new learning objective that guides our model to learn the diversity of environments within the same hierarchy while maintaining consistency across different hierarchies. This approach enables our model to consider the relationships between environments and facilitates robust graph invariant learning. Extensive experiments on real-world graph data have demonstrated the effectiveness of our framework. Particularly, in the challenging dataset DrugOOD, our method achieves up to 1.29% and 2.83% improvement over the best baselines on IC50 and EC50 prediction tasks, respectively.

Improving out-of-distribution generalization in graphs via hierarchical semantic environments

TL;DR

This work tackles graph out-of-distribution generalization by generating hierarchical semantic environments to capture complex distribution shifts in datasets like DrugOOD. The method combines hierarchical stochastic subgraph generation with hierarchical environment inference and two contrastive losses (EnvCon and LabelCon) to yield robust graph invariant learning, enforced via an overall objective . The approach emphasizes intra-hierarchy environment diversification and inter-hierarchy environment augmentation to model relationships among environments. Empirically, it achieves up to 1.29% ROC-AUC improvement on IC50 and 2.83% on EC50 (DrugOOD), outperforming state-of-the-art graph OOD baselines.

Abstract

Out-of-distribution (OOD) generalization in the graph domain is challenging due to complex distribution shifts and a lack of environmental contexts. Recent methods attempt to enhance graph OOD generalization by generating flat environments. However, such flat environments come with inherent limitations to capture more complex data distributions. Considering the DrugOOD dataset, which contains diverse training environments (e.g., scaffold, size, etc.), flat contexts cannot sufficiently address its high heterogeneity. Thus, a new challenge is posed to generate more semantically enriched environments to enhance graph invariant learning for handling distribution shifts. In this paper, we propose a novel approach to generate hierarchical semantic environments for each graph. Firstly, given an input graph, we explicitly extract variant subgraphs from the input graph to generate proxy predictions on local environments. Then, stochastic attention mechanisms are employed to re-extract the subgraphs for regenerating global environments in a hierarchical manner. In addition, we introduce a new learning objective that guides our model to learn the diversity of environments within the same hierarchy while maintaining consistency across different hierarchies. This approach enables our model to consider the relationships between environments and facilitates robust graph invariant learning. Extensive experiments on real-world graph data have demonstrated the effectiveness of our framework. Particularly, in the challenging dataset DrugOOD, our method achieves up to 1.29% and 2.83% improvement over the best baselines on IC50 and EC50 prediction tasks, respectively.
Paper Structure (46 sections, 13 equations, 8 figures, 8 tables, 1 algorithm)

This paper contains 46 sections, 13 equations, 8 figures, 8 tables, 1 algorithm.

Figures (8)

  • Figure 1: (a) Results on $\textsc{IC50-sca}$ dataset from DrugOOD ji2023drugood. (b) Flat environments from existing approaches. (c) Hierarchical environments from our methods. For visualization, we set #real environments as $10$.
  • Figure 2: Our Framework consists of (a) Hierarchical Stochastic Subgraph Generation in \ref{['sec:HSSG']}, (b) Hierarchical Semantic Environments in \ref{['sec:HSE']}, (c) Robust GIL with Hierarchical Semantic Environments in \ref{['sec:GIL']}.
  • Figure 3: Discussions on the diversity of generated environments. We show distributions of two generated environments $env_0$ and $env_1$ for (a) random sampling methods, (b) flat environment inference methods, and (c) our hierarchical environment inference methods. (d) We employ the Kolmogorov-Smirnov test massey1951kolmogorov to calculate the diversity of three methods.
  • Figure 4: Illustration of objective $\mathcal{L_\text{EnvCon}}$ in Inter-Hierarchy Environment Augmentation. (a) We pull environment-based neighborhoods $\mathcal{N}^k_{p_e}(z^k_v)$ and $\mathcal{N}^{k-1}_{p_e}(z^k_v)$ toward anchor variant subgraph embedding $z^k_v$. (b) We show a simple illustration of anchor variant subgraph $z^{k-1}_v$ in the previous hierarchy $k-1$. (c) Notations of the illustration figure.
  • Figure 5: Hyperparameter Selection of alpha, beta, #Envs, and #Hierarchies.
  • ...and 3 more figures