Table of Contents
Fetching ...

Extending Graph Condensation to Multi-Label Datasets: A Benchmark Study

Liangliang Zhang, Haoran Bao, Yao Ma

TL;DR

This work tackles the scalability challenge of training GNNs on large multi-label graphs by extending graph condensation to handle multiple labels per node. It adapts three condensation methods (GCond, SGDD, GCDM) to the multi-label setting through new initialization and loss strategies, and benchmarks them across eight real-world datasets. The study identifies that GCond with K-Center initialization and BCELoss, especially with structure learning, yields strong performance and highlights practical guidelines for multi-label condensation. The resulting benchmark provides a foundation for scalable, efficient learning on multi-label graph data and informs real-world applications where nodes bear multiple annotations.

Abstract

As graph data grows increasingly complicate, training graph neural networks (GNNs) on large-scale datasets presents significant challenges, including computational resource constraints, data redundancy, and transmission inefficiencies. While existing graph condensation techniques have shown promise in addressing these issues, they are predominantly designed for single-label datasets, where each node is associated with a single class label. However, many real-world applications, such as social network analysis and bioinformatics, involve multi-label graph datasets, where one node can have various related labels. To deal with this problem, we extends traditional graph condensation approaches to accommodate multi-label datasets by introducing modifications to synthetic dataset initialization and condensing optimization. Through experiments on eight real-world multi-label graph datasets, we prove the effectiveness of our method. In experiment, the GCond framework, combined with K-Center initialization and binary cross-entropy loss (BCELoss), achieves best performance in general. This benchmark for multi-label graph condensation not only enhances the scalability and efficiency of GNNs for multi-label graph data, but also offering substantial benefits for diverse real-world applications.

Extending Graph Condensation to Multi-Label Datasets: A Benchmark Study

TL;DR

This work tackles the scalability challenge of training GNNs on large multi-label graphs by extending graph condensation to handle multiple labels per node. It adapts three condensation methods (GCond, SGDD, GCDM) to the multi-label setting through new initialization and loss strategies, and benchmarks them across eight real-world datasets. The study identifies that GCond with K-Center initialization and BCELoss, especially with structure learning, yields strong performance and highlights practical guidelines for multi-label condensation. The resulting benchmark provides a foundation for scalable, efficient learning on multi-label graph data and informs real-world applications where nodes bear multiple annotations.

Abstract

As graph data grows increasingly complicate, training graph neural networks (GNNs) on large-scale datasets presents significant challenges, including computational resource constraints, data redundancy, and transmission inefficiencies. While existing graph condensation techniques have shown promise in addressing these issues, they are predominantly designed for single-label datasets, where each node is associated with a single class label. However, many real-world applications, such as social network analysis and bioinformatics, involve multi-label graph datasets, where one node can have various related labels. To deal with this problem, we extends traditional graph condensation approaches to accommodate multi-label datasets by introducing modifications to synthetic dataset initialization and condensing optimization. Through experiments on eight real-world multi-label graph datasets, we prove the effectiveness of our method. In experiment, the GCond framework, combined with K-Center initialization and binary cross-entropy loss (BCELoss), achieves best performance in general. This benchmark for multi-label graph condensation not only enhances the scalability and efficiency of GNNs for multi-label graph data, but also offering substantial benefits for diverse real-world applications.

Paper Structure

This paper contains 21 sections, 21 equations, 7 figures, 9 tables, 1 algorithm.

Figures (7)

  • Figure 1: Workflow for multi-label graph condensation. It shows the process of condensing a large multi-label graph $\mathcal{G}=\{A, X, Y\}$ into a smaller synthetic graph $\mathcal{S}=\{A', X', Y'\}$, where $Y$ and $Y'$ represent the multi-label matrix. Various matching strategies, denoted by $\mathcal{M}(\cdot)$, are employed to ensure that key information in the original graph is captured. Our goal is to use the synthetic graph $\mathcal{S}$ to train a GNN that achieves comparable performance to the one trained on the original graph $\mathcal{G}$, thus reducing graph size while retaining performance.
  • Figure 2: Multi-label Correlation Visualization
  • Figure 3: Multi-label Class Distribution Visualization
  • Figure 4: Different Initialization Methods Performance. Performance metrics of the model, with the F1-score represented as a decimal value.
  • Figure 5: Performance Optimized by SoftMargin Loss. Performance metrics of the model, with the F1-score represented as a decimal value.
  • ...and 2 more figures