Data Imputation from the Perspective of Graph Dirichlet Energy

Weiqi Zhang; Guanlue Li; Jianheng Tang; Jia Li; Fugee Tsung

Data Imputation from the Perspective of Graph Dirichlet Energy

Weiqi Zhang, Guanlue Li, Jianheng Tang, Jia Li, Fugee Tsung

TL;DR

This work examines missing data imputation through the lens of graph Dirichlet energy, revealing that standard draft steps reduce energy and that refinement must preserve energy to achieve high-quality imputation. It introduces Graph Laplacian Pyramid Network (GLPN), a two-branch architecture combining a U-shaped autoencoder for global structure and a residual graph deconvolution branch for local details, designed to maintain $E_D$ while refining imputations. The authors establish energy-based bounds and demonstrate empirically that GLPN outperforms state-of-the-art baselines across MCAR, MAR, MNAR, and both homophilous and heterophilous settings, including multi-graph datasets. The results suggest that energy-preserving design is a principled and effective strategy for robust graph-based missing data imputation with broad practical impact.

Abstract

Data imputation is a crucial task due to the widespread occurrence of missing data. Many methods adopt a two-step approach: initially crafting a preliminary imputation (the "draft") and then refining it to produce the final missing data imputation result, commonly referred to as "draft-then-refine". In our study, we examine this prevalent strategy through the lens of graph Dirichlet energy. We observe that a basic "draft" imputation tends to decrease the Dirichlet energy. Therefore, a subsequent "refine" step is necessary to restore the overall energy balance. Existing refinement techniques, such as the Graph Convolutional Network (GCN), often result in further energy reduction. To address this, we introduce a new framework, the Graph Laplacian Pyramid Network (GLPN). GLPN incorporates a U-shaped autoencoder and residual networks to capture both global and local details effectively. Through extensive experiments on multiple real-world datasets, GLPN consistently outperforms state-of-the-art methods across three different missing data mechanisms. The code is available at https://github.com/liguanlue/GLPN.

Data Imputation from the Perspective of Graph Dirichlet Energy

TL;DR

while refining imputations. The authors establish energy-based bounds and demonstrate empirically that GLPN outperforms state-of-the-art baselines across MCAR, MAR, MNAR, and both homophilous and heterophilous settings, including multi-graph datasets. The results suggest that energy-preserving design is a principled and effective strategy for robust graph-based missing data imputation with broad practical impact.

Abstract

Paper Structure (34 sections, 2 theorems, 26 equations, 5 figures, 5 tables)

This paper contains 34 sections, 2 theorems, 26 equations, 5 figures, 5 tables.

Introduction
Related Work
Data Imputation
Laplacian Pyramid and Graph U-Net
Preliminaries
Task Definition
Graph Dirichlet Energy
Draft-then-Refine Imputation
Model Design
Draft Imputation
GLPN Architecture
U-shaped Autoencoder
Residual Network
Energy Maintenance Analysis
Graph Convolutional Networks
...and 19 more sections

Key Result

Proposition 3.2

Suppose each element in $\bm X$ is identically independent drawn from a certain distribution whose first and second moment constraints satisfy $\mathbb E(\bm X_{i,j}) = 0$ and $\text{Var}(\bm X_{i,j}) = 1$, and the imputation $\hat{\bm X}$ satisfy $\hat{\bm X}_{i,j}=\sum_{k\in S_k}\alpha_{k}\bm X^{\

Figures (5)

Figure 1: The Dirichlet energy of different imputation strategies on three experimental datasets (METR-LA, PEMS, NREL) with different missing ratios under missing-completely-at-random (MCAR) mechanism. Similar results can be obtained under other mechanisms. Relative Dirichlet energy is normalized by that of the ground truth features.
Figure 2: Overview of proposed Graph Laplacian Pyramid Network. The U-shaped autoencoder extracts the clustering and coarse patterns, while the Residual Network reconstructs local details. By combining these two parts, the model can refine the draft imputation and preserve the Dirichlet energy. The dashboards indicate the relative Dirichlet energy of draft and refined features.
Figure 3: Imputation results on four benchmark datasets with different settings: MCAR (top), MAR (middle), and MNAR (bottom). Both RMSE (left) and MAE (right) are normalized by the performance of mean imputation.
Figure 4: RMSE for METR-LA with different data missing ratios. Both structure-free and structure-based baselines are compared under three missing mechanisms.
Figure 5: Relative Dirichlet energy for METR-LA imputation with different data missing ratios. The Dirichlet energy is normalized by the ground truth features, whose relative energy equals 1.

Theorems & Definitions (6)

Definition 3.1: Graph Dirichlet Energy
Proposition 3.2
Proposition 5.1
proof
proof
proof

Data Imputation from the Perspective of Graph Dirichlet Energy

TL;DR

Abstract

Data Imputation from the Perspective of Graph Dirichlet Energy

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (5)

Theorems & Definitions (6)