Data Imputation from the Perspective of Graph Dirichlet Energy
Weiqi Zhang, Guanlue Li, Jianheng Tang, Jia Li, Fugee Tsung
TL;DR
This work examines missing data imputation through the lens of graph Dirichlet energy, revealing that standard draft steps reduce energy and that refinement must preserve energy to achieve high-quality imputation. It introduces Graph Laplacian Pyramid Network (GLPN), a two-branch architecture combining a U-shaped autoencoder for global structure and a residual graph deconvolution branch for local details, designed to maintain $E_D$ while refining imputations. The authors establish energy-based bounds and demonstrate empirically that GLPN outperforms state-of-the-art baselines across MCAR, MAR, MNAR, and both homophilous and heterophilous settings, including multi-graph datasets. The results suggest that energy-preserving design is a principled and effective strategy for robust graph-based missing data imputation with broad practical impact.
Abstract
Data imputation is a crucial task due to the widespread occurrence of missing data. Many methods adopt a two-step approach: initially crafting a preliminary imputation (the "draft") and then refining it to produce the final missing data imputation result, commonly referred to as "draft-then-refine". In our study, we examine this prevalent strategy through the lens of graph Dirichlet energy. We observe that a basic "draft" imputation tends to decrease the Dirichlet energy. Therefore, a subsequent "refine" step is necessary to restore the overall energy balance. Existing refinement techniques, such as the Graph Convolutional Network (GCN), often result in further energy reduction. To address this, we introduce a new framework, the Graph Laplacian Pyramid Network (GLPN). GLPN incorporates a U-shaped autoencoder and residual networks to capture both global and local details effectively. Through extensive experiments on multiple real-world datasets, GLPN consistently outperforms state-of-the-art methods across three different missing data mechanisms. The code is available at https://github.com/liguanlue/GLPN.
