Table of Contents
Fetching ...

Data Imputation from the Perspective of Graph Dirichlet Energy

Weiqi Zhang, Guanlue Li, Jianheng Tang, Jia Li, Fugee Tsung

TL;DR

This work examines missing data imputation through the lens of graph Dirichlet energy, revealing that standard draft steps reduce energy and that refinement must preserve energy to achieve high-quality imputation. It introduces Graph Laplacian Pyramid Network (GLPN), a two-branch architecture combining a U-shaped autoencoder for global structure and a residual graph deconvolution branch for local details, designed to maintain $E_D$ while refining imputations. The authors establish energy-based bounds and demonstrate empirically that GLPN outperforms state-of-the-art baselines across MCAR, MAR, MNAR, and both homophilous and heterophilous settings, including multi-graph datasets. The results suggest that energy-preserving design is a principled and effective strategy for robust graph-based missing data imputation with broad practical impact.

Abstract

Data imputation is a crucial task due to the widespread occurrence of missing data. Many methods adopt a two-step approach: initially crafting a preliminary imputation (the "draft") and then refining it to produce the final missing data imputation result, commonly referred to as "draft-then-refine". In our study, we examine this prevalent strategy through the lens of graph Dirichlet energy. We observe that a basic "draft" imputation tends to decrease the Dirichlet energy. Therefore, a subsequent "refine" step is necessary to restore the overall energy balance. Existing refinement techniques, such as the Graph Convolutional Network (GCN), often result in further energy reduction. To address this, we introduce a new framework, the Graph Laplacian Pyramid Network (GLPN). GLPN incorporates a U-shaped autoencoder and residual networks to capture both global and local details effectively. Through extensive experiments on multiple real-world datasets, GLPN consistently outperforms state-of-the-art methods across three different missing data mechanisms. The code is available at https://github.com/liguanlue/GLPN.

Data Imputation from the Perspective of Graph Dirichlet Energy

TL;DR

This work examines missing data imputation through the lens of graph Dirichlet energy, revealing that standard draft steps reduce energy and that refinement must preserve energy to achieve high-quality imputation. It introduces Graph Laplacian Pyramid Network (GLPN), a two-branch architecture combining a U-shaped autoencoder for global structure and a residual graph deconvolution branch for local details, designed to maintain while refining imputations. The authors establish energy-based bounds and demonstrate empirically that GLPN outperforms state-of-the-art baselines across MCAR, MAR, MNAR, and both homophilous and heterophilous settings, including multi-graph datasets. The results suggest that energy-preserving design is a principled and effective strategy for robust graph-based missing data imputation with broad practical impact.

Abstract

Data imputation is a crucial task due to the widespread occurrence of missing data. Many methods adopt a two-step approach: initially crafting a preliminary imputation (the "draft") and then refining it to produce the final missing data imputation result, commonly referred to as "draft-then-refine". In our study, we examine this prevalent strategy through the lens of graph Dirichlet energy. We observe that a basic "draft" imputation tends to decrease the Dirichlet energy. Therefore, a subsequent "refine" step is necessary to restore the overall energy balance. Existing refinement techniques, such as the Graph Convolutional Network (GCN), often result in further energy reduction. To address this, we introduce a new framework, the Graph Laplacian Pyramid Network (GLPN). GLPN incorporates a U-shaped autoencoder and residual networks to capture both global and local details effectively. Through extensive experiments on multiple real-world datasets, GLPN consistently outperforms state-of-the-art methods across three different missing data mechanisms. The code is available at https://github.com/liguanlue/GLPN.
Paper Structure (34 sections, 2 theorems, 26 equations, 5 figures, 5 tables)

This paper contains 34 sections, 2 theorems, 26 equations, 5 figures, 5 tables.

Key Result

Proposition 3.2

Suppose each element in $\bm X$ is identically independent drawn from a certain distribution whose first and second moment constraints satisfy $\mathbb E(\bm X_{i,j}) = 0$ and $\text{Var}(\bm X_{i,j}) = 1$, and the imputation $\hat{\bm X}$ satisfy $\hat{\bm X}_{i,j}=\sum_{k\in S_k}\alpha_{k}\bm X^{\

Figures (5)

  • Figure 1: The Dirichlet energy of different imputation strategies on three experimental datasets (METR-LA, PEMS, NREL) with different missing ratios under missing-completely-at-random (MCAR) mechanism. Similar results can be obtained under other mechanisms. Relative Dirichlet energy is normalized by that of the ground truth features.
  • Figure 2: Overview of proposed Graph Laplacian Pyramid Network. The U-shaped autoencoder extracts the clustering and coarse patterns, while the Residual Network reconstructs local details. By combining these two parts, the model can refine the draft imputation and preserve the Dirichlet energy. The dashboards indicate the relative Dirichlet energy of draft and refined features.
  • Figure 3: Imputation results on four benchmark datasets with different settings: MCAR (top), MAR (middle), and MNAR (bottom). Both RMSE (left) and MAE (right) are normalized by the performance of mean imputation.
  • Figure 4: RMSE for METR-LA with different data missing ratios. Both structure-free and structure-based baselines are compared under three missing mechanisms.
  • Figure 5: Relative Dirichlet energy for METR-LA imputation with different data missing ratios. The Dirichlet energy is normalized by the ground truth features, whose relative energy equals 1.

Theorems & Definitions (6)

  • Definition 3.1: Graph Dirichlet Energy
  • Proposition 3.2
  • Proposition 5.1
  • proof
  • proof
  • proof