Table of Contents
Fetching ...

DPGAN: A Dual-Path Generative Adversarial Network for Missing Data Imputation in Graphs

Xindi Zheng, Yuwei Wu, Yu Pan, Wanyu Lin, Lei Ma, Jianjun Zhao

TL;DR

DPGAN addresses missing data imputation on graphs by integrating a dual-path generator—GraphUnet++ for structural, local cues and MLPUnet++ for global, high-frequency information—with a subgraph discriminator that enforces local fidelity through patch-like adversarial feedback. The model optimizes via a Wasserstein GAN with gradient penalty and a reconstruction term, enabling robust imputation across diverse missing-rate scenarios. Empirical results on both multi-graph and single-graph benchmarks show DPGAN achieving state-of-the-art RMSE improvements (2.99%–27.6% over baselines) and improved downstream task performance, with ablations highlighting the importance of the dual-path design and subgraph-level discrimination. The work advances practical graph data imputation by addressing oversmoothing and stability challenges, providing a scalable framework for imputing graph attributes in real-world datasets. The code is available at the provided repository, enabling reproducibility and adoption in related graph-learning tasks.

Abstract

Missing data imputation poses a paramount challenge when dealing with graph data. Prior works typically are based on feature propagation or graph autoencoders to address this issue. However, these methods usually encounter the over-smoothing issue when dealing with missing data, as the graph neural network (GNN) modules are not explicitly designed for handling missing data. This paper proposes a novel framework, called Dual-Path Generative Adversarial Network (DPGAN), that can deal simultaneously with missing data and avoid over-smoothing problems. The crux of our work is that it admits both global and local representations of the input graph signal, which can capture the long-range dependencies. It is realized via our proposed generator, consisting of two key components, i.e., MLPUNet++ and GraphUNet++. Our generator is trained with a designated discriminator via an adversarial process. In particular, to avoid assessing the entire graph as did in the literature, our discriminator focuses on the local subgraph fidelity, thereby boosting the quality of the local imputation. The subgraph size is adjustable, allowing for control over the intensity of adversarial regularization. Comprehensive experiments across various benchmark datasets substantiate that DPGAN consistently rivals, if not outperforms, existing state-of-the-art imputation algorithms. The code is provided at \url{https://github.com/momoxia/DPGAN}.

DPGAN: A Dual-Path Generative Adversarial Network for Missing Data Imputation in Graphs

TL;DR

DPGAN addresses missing data imputation on graphs by integrating a dual-path generator—GraphUnet++ for structural, local cues and MLPUnet++ for global, high-frequency information—with a subgraph discriminator that enforces local fidelity through patch-like adversarial feedback. The model optimizes via a Wasserstein GAN with gradient penalty and a reconstruction term, enabling robust imputation across diverse missing-rate scenarios. Empirical results on both multi-graph and single-graph benchmarks show DPGAN achieving state-of-the-art RMSE improvements (2.99%–27.6% over baselines) and improved downstream task performance, with ablations highlighting the importance of the dual-path design and subgraph-level discrimination. The work advances practical graph data imputation by addressing oversmoothing and stability challenges, providing a scalable framework for imputing graph attributes in real-world datasets. The code is available at the provided repository, enabling reproducibility and adoption in related graph-learning tasks.

Abstract

Missing data imputation poses a paramount challenge when dealing with graph data. Prior works typically are based on feature propagation or graph autoencoders to address this issue. However, these methods usually encounter the over-smoothing issue when dealing with missing data, as the graph neural network (GNN) modules are not explicitly designed for handling missing data. This paper proposes a novel framework, called Dual-Path Generative Adversarial Network (DPGAN), that can deal simultaneously with missing data and avoid over-smoothing problems. The crux of our work is that it admits both global and local representations of the input graph signal, which can capture the long-range dependencies. It is realized via our proposed generator, consisting of two key components, i.e., MLPUNet++ and GraphUNet++. Our generator is trained with a designated discriminator via an adversarial process. In particular, to avoid assessing the entire graph as did in the literature, our discriminator focuses on the local subgraph fidelity, thereby boosting the quality of the local imputation. The subgraph size is adjustable, allowing for control over the intensity of adversarial regularization. Comprehensive experiments across various benchmark datasets substantiate that DPGAN consistently rivals, if not outperforms, existing state-of-the-art imputation algorithms. The code is provided at \url{https://github.com/momoxia/DPGAN}.
Paper Structure (26 sections, 10 equations, 5 figures, 6 tables)

This paper contains 26 sections, 10 equations, 5 figures, 6 tables.

Figures (5)

  • Figure 1: GNNs typically impute missing values by leveraging information from the neighboring nodes. In the graph, node 1 incorrectly imputed a missing value, which could be attributed to the influence of its neighboring nodes.
  • Figure 2: Overview of GraphUnet++: Incorporating graph convolutional layers, node-mix MLP, graph pooling, and graph unpooling. The combination of GCN and node-mix MLP is utilized to extract both global and local representations of the graph. The node-mix MLP facilitates information exchange between nodes, with parameter sharing across all layers. Graph pooling is executed using LEConv to calculate the impact score for each node in clusters, followed by selecting top-k score nodes and the maintaining connectivity of subgraphs. Unpooling involves putting the feature of pooled graph back to its original corresponding nodes.
  • Figure 3: MLPUnet++ Network consists of node-mix MLPs and feature-mix MLPs, each consisting of two fully connected layers and a LeakyReLU nonlinearity. Skip-connections are also included.
  • Figure 4: Subgraph Discriminator: Following one graph convolution layer with a node-mix layer and one graph pooling layer, the graph size is reduced to 4 nodes. Each node represents the fidelity of the subgraph in its position.
  • Figure 5: The imputation results under various missing rates. Our model is effective under various rates of data missing. Remarkably, it can reconstruct features relying solely on structure. As the rate of data missing varies, the final alpha value of the model also changes. A lower data missing rate results in a higher alpha value, indicating the dominant role of our designed MLPUnet.