Drop the mask! GAMM-A Taxonomy for Graph Attributes Missing Mechanisms
Richard Serrano, Baptiste Jeudy, Charlotte Laclau, Christine Largeron
TL;DR
This work extends missing-data theory to graphs by introducing GAMM, a Graph Attributes Missing Mechanisms taxonomy that ties missingness probability to node attributes and graph structure via a generic function $P(\Omega_{ij}=0)=g(\cdot)$. It classifies mechanisms into Attribute-based, Structural, Neighborhood, and Generic combinations, and discusses identifiability challenges between MAR and MNAR in graphs. Through an extensive experimental protocol over twelve real-world graphs and multiple missingness masks, the study demonstrates that graph-aware missingness significantly degrades state-of-the-art imputers, with MNAR-neighborhood dependencies in heterophilic graphs posing the most severe challenges and often distorting distributions despite reconstruction accuracy. The authors provide open-source code and emphasize the need for robust imputation methods that account for graph structure to ensure faithful recovery of attribute distributions and reliable downstream inference.
Abstract
Exploring missing data in attributed graphs introduces unique challenges beyond those found in tabular datasets. In this work, we extend the taxonomy for missing data mechanisms to attributed graphs by proposing GAMM (Graph Attributes Missing Mechanisms), a framework that systematically links missingness probability to both node attributes and the underlying graph structure. Our taxonomy enriches the conventional definitions of masking mechanisms by introducing graph-specific dependencies. We empirically demonstrate that state-of-the-art imputation methods, while effective on traditional masks, significantly struggle when confronted with these more realistic graph-aware missingness scenarios.
