Table of Contents
Fetching ...

A Survey of Out-of-distribution Generalization for Graph Machine Learning from a Causal View

Jing Ma

TL;DR

A thorough review of recent progress in causality‐involved GML generalization and explores the incorporation of causality in other related important areas of trustworthy GML, such as explanation, fairness, and robustness.

Abstract

Graph machine learning (GML) has been successfully applied across a wide range of tasks. Nonetheless, GML faces significant challenges in generalizing over out-of-distribution (OOD) data, which raises concerns about its wider applicability. Recent advancements have underscored the crucial role of causality-driven approaches in overcoming these generalization challenges. Distinct from traditional GML methods that primarily rely on statistical dependencies, causality-focused strategies delve into the underlying causal mechanisms of data generation and model prediction, thus significantly improving the generalization of GML across different environments. This paper offers a thorough review of recent progress in causality-involved GML generalization. We elucidate the fundamental concepts of employing causality to enhance graph model generalization and categorize the various approaches, providing detailed descriptions of their methodologies and the connections among them. Furthermore, we explore the incorporation of causality in other related important areas of trustworthy GML, such as explanation, fairness, and robustness. Concluding with a discussion on potential future research directions, this review seeks to articulate the continuing development and future potential of causality in enhancing the trustworthiness of graph machine learning.

A Survey of Out-of-distribution Generalization for Graph Machine Learning from a Causal View

TL;DR

A thorough review of recent progress in causality‐involved GML generalization and explores the incorporation of causality in other related important areas of trustworthy GML, such as explanation, fairness, and robustness.

Abstract

Graph machine learning (GML) has been successfully applied across a wide range of tasks. Nonetheless, GML faces significant challenges in generalizing over out-of-distribution (OOD) data, which raises concerns about its wider applicability. Recent advancements have underscored the crucial role of causality-driven approaches in overcoming these generalization challenges. Distinct from traditional GML methods that primarily rely on statistical dependencies, causality-focused strategies delve into the underlying causal mechanisms of data generation and model prediction, thus significantly improving the generalization of GML across different environments. This paper offers a thorough review of recent progress in causality-involved GML generalization. We elucidate the fundamental concepts of employing causality to enhance graph model generalization and categorize the various approaches, providing detailed descriptions of their methodologies and the connections among them. Furthermore, we explore the incorporation of causality in other related important areas of trustworthy GML, such as explanation, fairness, and robustness. Concluding with a discussion on potential future research directions, this review seeks to articulate the continuing development and future potential of causality in enhancing the trustworthiness of graph machine learning.
Paper Structure (15 sections, 2 equations, 2 figures, 1 table)

This paper contains 15 sections, 2 equations, 2 figures, 1 table.

Figures (2)

  • Figure 1: The representative methods for causality-involved GML OOD generalization.
  • Figure 2: An overview of causal graphs employed in representative GML OOD generalization methods, including CAL CAL_2022, CAL+ CAL+_2024, CaNet CaNet_2024, DSE DSE_2022, DisC DisC_2022, CIGA CIGA, E-invariant GR E_invariant_2021, and gMPNN gMPNN_2022. $G$, $C$, $S$, $Y$, $Z$, and $E$ denote graphs, causal variables, spurious variables, prediction labels, graph representations, and environments, respectively. $G_C$ and $G_S$ in (e) represent the causal subgraph and spurious subgraph; while $G_S$ and $G_S^*$ in (c) represent an explanatory subgraph and a surrogate subgraph, respectively. For those causal graphs with nodes in different colors, observed/unobserved variables are in grey/white. Dashed lines show unknown correlations.

Theorems & Definitions (2)

  • Definition 1
  • Definition 2: 1