Table of Contents
Fetching ...

Data Imputation with Iterative Graph Reconstruction

Jiajun Zhong, Weiwei Ye, Ning Gui

TL;DR

IGRM tackles missing data in tabular sets by introducing a learnable friend network that differentiates sample relevance during imputation. The method jointly optimizes the friend network and a bipartite graph imputation model in an end-to-end loop, using differentiable structure augmentation and sample-embedding guidance to improve information flow. It achieves substantial MAE gains on eight real-world datasets, including a 39.13% reduction versus baselines and 9.04% against the second-best at 30% missing, and demonstrates robust performance across varying missing ratios. This work highlights the practical value of encoding sample-sample relations via a learned friend network to enhance graph-based imputation pipelines.

Abstract

Effective data imputation demands rich latent ``structure" discovery capabilities from ``plain" tabular data. Recent advances in graph neural networks-based data imputation solutions show their strong structure learning potential by directly translating tabular data as bipartite graphs. However, due to a lack of relations between samples, those solutions treat all samples equally which is against one important observation: ``similar sample should give more information about missing values." This paper presents a novel Iterative graph Generation and Reconstruction framework for Missing data imputation(IGRM). Instead of treating all samples equally, we introduce the concept: ``friend networks" to represent different relations among samples. To generate an accurate friend network with missing data, an end-to-end friend network reconstruction solution is designed to allow for continuous friend network optimization during imputation learning. The representation of the optimized friend network, in turn, is used to further optimize the data imputation process with differentiated message passing. Experiment results on eight benchmark datasets show that IGRM yields 39.13% lower mean absolute error compared with nine baselines and 9.04% lower than the second-best. Our code is available at https://github.com/G-AILab/IGRM.

Data Imputation with Iterative Graph Reconstruction

TL;DR

IGRM tackles missing data in tabular sets by introducing a learnable friend network that differentiates sample relevance during imputation. The method jointly optimizes the friend network and a bipartite graph imputation model in an end-to-end loop, using differentiable structure augmentation and sample-embedding guidance to improve information flow. It achieves substantial MAE gains on eight real-world datasets, including a 39.13% reduction versus baselines and 9.04% against the second-best at 30% missing, and demonstrates robust performance across varying missing ratios. This work highlights the practical value of encoding sample-sample relations via a learned friend network to enhance graph-based imputation pipelines.

Abstract

Effective data imputation demands rich latent ``structure" discovery capabilities from ``plain" tabular data. Recent advances in graph neural networks-based data imputation solutions show their strong structure learning potential by directly translating tabular data as bipartite graphs. However, due to a lack of relations between samples, those solutions treat all samples equally which is against one important observation: ``similar sample should give more information about missing values." This paper presents a novel Iterative graph Generation and Reconstruction framework for Missing data imputation(IGRM). Instead of treating all samples equally, we introduce the concept: ``friend networks" to represent different relations among samples. To generate an accurate friend network with missing data, an end-to-end friend network reconstruction solution is designed to allow for continuous friend network optimization during imputation learning. The representation of the optimized friend network, in turn, is used to further optimize the data imputation process with differentiated message passing. Experiment results on eight benchmark datasets show that IGRM yields 39.13% lower mean absolute error compared with nine baselines and 9.04% lower than the second-best. Our code is available at https://github.com/G-AILab/IGRM.
Paper Structure (29 sections, 12 equations, 5 figures, 4 tables, 1 algorithm)

This paper contains 29 sections, 12 equations, 5 figures, 4 tables, 1 algorithm.

Figures (5)

  • Figure 1: Imputation with friend network. Gray indicates missing values. Similar samples are marked with the same color with the ground-truth in a) and missing data in b). c) shows the trend of IGRM with adding an ideal friend network on Concrete in UCI dataset. d) shows similarity difference between ground-truth and data with missing.
  • Figure 2: Overall architecture of the proposed IGRM framework in $t$-th iteration.
  • Figure 3: Averaged MAE of feature imputation with different missing ratios over five trials.
  • Figure 4: The t-SNE embeddings of nodes from IGRM and GRAPE. The value in () indicates the silhouette coefficient.
  • Figure 5: Trends of similarity deviation distribution during training with missing ratio 0.3. The deviation is the absolute value of the difference between the generated embeddings similarity and the ground-truth similarity.