Learning Invariant Graph Representations Through Redundant Information
Barproda Halder, Pasan Dissanayake, Sanghamitra Dutta
TL;DR
This work addresses the challenge of out-of-distribution generalization for graph classification by leveraging Partial Information Decomposition to focus on redundant information shared between invariant and spurious subgraphs. It introduces RIG, a multi-level, alternating-optimization framework that disentangles causal from spurious graph components by maximizing redundancy and using a contrastive objective, guided by an environment-assistant. Theoretical connections between SCMs and PID motivate the redaction of redundant information as a central objective, and extensive experiments on synthetic and real-world datasets demonstrate improved OOD robustness over strong baselines. Collectively, the approach provides a principled, information-theoretic mechanism to derive more reliable invariant graph representations for diverse distribution shifts.
Abstract
Learning invariant graph representations for out-of-distribution (OOD) generalization remains challenging because the learned representations often retain spurious components. To address this challenge, this work introduces a new tool from information theory called Partial Information Decomposition (PID) that goes beyond classical information-theoretic measures. We identify limitations in existing approaches for invariant representation learning that solely rely on classical information-theoretic measures, motivating the need to precisely focus on redundant information about the target $Y$ shared between spurious subgraphs $G_s$ and invariant subgraphs $G_c$ obtained via PID. Next, we propose a new multi-level optimization framework that we call -- Redundancy-guided Invariant Graph learning (RIG) -- that maximizes redundant information while isolating spurious and causal subgraphs, enabling OOD generalization under diverse distribution shifts. Our approach relies on alternating between estimating a lower bound of redundant information (which itself requires an optimization) and maximizing it along with additional objectives. Experiments on both synthetic and real-world graph datasets demonstrate the generalization capabilities of our proposed RIG framework.
