Table of Contents
Fetching ...

Synergistic Graph Fusion via Encoder Embedding

Cencheng Shen, Carey E. Priebe, Jonathan Larson, Ha Trinh

TL;DR

The paper tackles multi-graph data on a common vertex set under supervised vertex classification. It introduces graph fusion embedding, which encodes each graph via a label-informed encoder, normalizes, and concatenates per-graph embeddings to form an $n \times MK$ representation that supports downstream classifiers. Theoretical results under both DC-SBM and a general graph model show convergence to class-conditioned means with an $O(1/\sqrt{n})$ rate and establish an asymptotic condition for perfect separation, along with a synergistic effect where adding graphs cannot deteriorate—and can improve—classification performance. Empirical evidence from simulations and real data across two, three, and four graphs demonstrates strong, robust improvements and practical scalability, highlighting the method's potential for data fusion across diverse domains.

Abstract

In this paper, we introduce a method called graph fusion embedding, designed for multi-graph embedding with shared vertex sets. Under the framework of supervised learning, our method exhibits a remarkable and highly desirable synergistic effect: for sufficiently large vertex size, the accuracy of vertex classification consistently benefits from the incorporation of additional graphs. We establish the mathematical foundation for the method, including the asymptotic convergence of the embedding, a sufficient condition for asymptotic optimal classification, and the proof of the synergistic effect for vertex classification. Our comprehensive simulations and real data experiments provide compelling evidence supporting the effectiveness of our proposed method, showcasing the pronounced synergistic effect for multiple graphs from disparate sources.

Synergistic Graph Fusion via Encoder Embedding

TL;DR

The paper tackles multi-graph data on a common vertex set under supervised vertex classification. It introduces graph fusion embedding, which encodes each graph via a label-informed encoder, normalizes, and concatenates per-graph embeddings to form an representation that supports downstream classifiers. Theoretical results under both DC-SBM and a general graph model show convergence to class-conditioned means with an rate and establish an asymptotic condition for perfect separation, along with a synergistic effect where adding graphs cannot deteriorate—and can improve—classification performance. Empirical evidence from simulations and real data across two, three, and four graphs demonstrates strong, robust improvements and practical scalability, highlighting the method's potential for data fusion across diverse domains.

Abstract

In this paper, we introduce a method called graph fusion embedding, designed for multi-graph embedding with shared vertex sets. Under the framework of supervised learning, our method exhibits a remarkable and highly desirable synergistic effect: for sufficiently large vertex size, the accuracy of vertex classification consistently benefits from the incorporation of additional graphs. We establish the mathematical foundation for the method, including the asymptotic convergence of the embedding, a sufficient condition for asymptotic optimal classification, and the proof of the synergistic effect for vertex classification. Our comprehensive simulations and real data experiments provide compelling evidence supporting the effectiveness of our proposed method, showcasing the pronounced synergistic effect for multiple graphs from disparate sources.
Paper Structure (26 sections, 10 theorems, 46 equations, 6 figures, 6 tables)

This paper contains 26 sections, 10 theorems, 46 equations, 6 figures, 6 tables.

Key Result

Theorem 1

Suppose that $\{\mathbf{A}_{m}, m=1,\ldots,M\}$ follows the DC-SBM model in Section sec1. Let $n$ be the number of vertices with known labels. Then, for any vertex $i$ belonging to class $\mathbf{Y}_i$, its graph fusion embedding satisfies:

Figures (6)

  • Figure 1: This figure illustrates the runtime performance of the graph fusion embedding (left panel) and the unfolded spectral embedding (right panel). The graphs are generated by simulation model 2 in Section \ref{['sec:sim']}, and we carry out $20$ Monte-Carlo replicates at each sample size $n$. Each line represents the average running time, with error bars indicating the standard deviation.
  • Figure 2: Visualize three SBM graphs at $n=1000$.
  • Figure 3: The figure shows the 5-fold classification error of graph fusion embedding, averaged over 20 Monte-Carlo replicates, for the three simulation settings.
  • Figure B1: The same experiments as in Figure \ref{['fig1']} were conducted using three heterogeneous signal graphs, and the blue dotted line shows the error for adding $17$ noise graphs to the three signal graphs.
  • Figure B2: The same experiments as in Figure \ref{['fig1']} were conducted using three different classifiers: neural network (first row), nearest-neighbor (second row), and linear discriminant (third row).
  • ...and 1 more figures

Theorems & Definitions (15)

  • Theorem 1
  • Theorem 2
  • Theorem 3
  • Theorem 4
  • Theorem 5
  • Theorem 1
  • proof
  • Theorem 2
  • proof
  • Theorem 3
  • ...and 5 more