Table of Contents
Fetching ...

DeCaf: A Causal Decoupling Framework for OOD Generalization on Node Classification

Xiaoxue Han, Huzefa Rangwala, Yue Ning

TL;DR

A more realistic graph data generation model using Structural Causal Models (SCMs) is introduced, allowing us to redefine distribution shifts by pinpointing their origins within the generation process, and a casual decoupling framework, DeCaf, that independently learns unbiased feature-label and structure-label mappings is proposed.

Abstract

Graph Neural Networks (GNNs) are susceptible to distribution shifts, creating vulnerability and security issues in critical domains. There is a pressing need to enhance the generalizability of GNNs on out-of-distribution (OOD) test data. Existing methods that target learning an invariant (feature, structure)-label mapping often depend on oversimplified assumptions about the data generation process, which do not adequately reflect the actual dynamics of distribution shifts in graphs. In this paper, we introduce a more realistic graph data generation model using Structural Causal Models (SCMs), allowing us to redefine distribution shifts by pinpointing their origins within the generation process. Building on this, we propose a casual decoupling framework, DeCaf, that independently learns unbiased feature-label and structure-label mappings. We provide a detailed theoretical framework that shows how our approach can effectively mitigate the impact of various distribution shifts. We evaluate DeCaf across both real-world and synthetic datasets that demonstrate different patterns of shifts, confirming its efficacy in enhancing the generalizability of GNNs.

DeCaf: A Causal Decoupling Framework for OOD Generalization on Node Classification

TL;DR

A more realistic graph data generation model using Structural Causal Models (SCMs) is introduced, allowing us to redefine distribution shifts by pinpointing their origins within the generation process, and a casual decoupling framework, DeCaf, that independently learns unbiased feature-label and structure-label mappings is proposed.

Abstract

Graph Neural Networks (GNNs) are susceptible to distribution shifts, creating vulnerability and security issues in critical domains. There is a pressing need to enhance the generalizability of GNNs on out-of-distribution (OOD) test data. Existing methods that target learning an invariant (feature, structure)-label mapping often depend on oversimplified assumptions about the data generation process, which do not adequately reflect the actual dynamics of distribution shifts in graphs. In this paper, we introduce a more realistic graph data generation model using Structural Causal Models (SCMs), allowing us to redefine distribution shifts by pinpointing their origins within the generation process. Building on this, we propose a casual decoupling framework, DeCaf, that independently learns unbiased feature-label and structure-label mappings. We provide a detailed theoretical framework that shows how our approach can effectively mitigate the impact of various distribution shifts. We evaluate DeCaf across both real-world and synthetic datasets that demonstrate different patterns of shifts, confirming its efficacy in enhancing the generalizability of GNNs.

Paper Structure

This paper contains 33 sections, 3 theorems, 81 equations, 4 figures, 12 tables, 1 algorithm.

Key Result

Proposition 1

Under covariate shift, the correlation between $\mathbf{x}$ and $\mathbf{a}$ shifts (e.g.$P^\text{train}(\mathbf{a}|\mathbf{x}) \neq P^\text{test}(\mathbf{a}|\mathbf{x})$ or $P^\text{train}(\mathbf{x}|\mathbf{a}) \neq P^\text{test}(\mathbf{x}|\mathbf{a})$). Consequently, the conditional distribution

Figures (4)

  • Figure 1: An overview of the conceptual flow. First, we introduce a new data generation process with the SCM. We separately estimate the individual impact of the node features and neighborhood representations on node labels in graph decoupling. To achieve an unbiased estimation of their impact, we propose to treat the impact as a treatment effect, which can be estimated with a casual estimation model that considers the confounding effect.
  • Figure 2: (a) SCMs represent a node's data generation process. (a-left) is the original SCM, and (a-right) is the modified SCM by replacing the causal link between $z$ and $y$ with pseudo-casual links. (b) The two-view SCMs for the casual effect estimation. For SCM-X, $\mathbf{a}$ is the treatment, and $\mathbf{x}$ is the confounder. For SCM-A, the treatment and the confounder are reversed.
  • Figure 3: Distribution of F1 scores on OGB-elliptic of different models shown in a bar plot. The dashed line shows the mean F1 score of the ERM method.
  • Figure 4: Hotelling's two-sample t-squared statistic of node feature embedding between subgraphs from different time periods in OGB-elliptic dataset.

Theorems & Definitions (10)

  • Definition 1
  • Definition 2
  • Definition 3
  • Definition 4
  • Proposition 1
  • proof
  • Proposition 2
  • proof
  • Proposition 3
  • proof