Table of Contents
Fetching ...

Hybrid FedGraph: An efficient hybrid federated learning algorithm using graph convolutional neural network

Jaeyeon Jang, Diego Klabjan, Veena Mendiratta, Fanfei Meng

TL;DR

Hybrid FedGraph tackles HBFL by learning representations at the client level and aggregating them with a server-side GCN that encodes feature-sharing structure. A privacy score selects the optimal local feature-extractor depth $j$, and Class-conditioned Random Clustering (CRC) preserves privacy while enabling mean-embedding aggregation. Experiments on Fashion-MNIST with vertical partial feature sharing show CRC achieving substantial gains over K-means and random clustering, illustrating the value of representation-level fusion in privacy-preserving HBFL/VFL. The work provides a practical, privacy-conscious framework for real-world multi-client learning with heterogeneous data distributions.

Abstract

Federated learning is an emerging paradigm for decentralized training of machine learning models on distributed clients, without revealing the data to the central server. Most existing works have focused on horizontal or vertical data distributions, where each client possesses different samples with shared features, or each client fully shares only sample indices, respectively. However, the hybrid scheme is much less studied, even though it is much more common in the real world. Therefore, in this paper, we propose a generalized algorithm, FedGraph, that introduces a graph convolutional neural network to capture feature-sharing information while learning features from a subset of clients. We also develop a simple but effective clustering algorithm that aggregates features produced by the deep neural networks of each client while preserving data privacy.

Hybrid FedGraph: An efficient hybrid federated learning algorithm using graph convolutional neural network

TL;DR

Hybrid FedGraph tackles HBFL by learning representations at the client level and aggregating them with a server-side GCN that encodes feature-sharing structure. A privacy score selects the optimal local feature-extractor depth , and Class-conditioned Random Clustering (CRC) preserves privacy while enabling mean-embedding aggregation. Experiments on Fashion-MNIST with vertical partial feature sharing show CRC achieving substantial gains over K-means and random clustering, illustrating the value of representation-level fusion in privacy-preserving HBFL/VFL. The work provides a practical, privacy-conscious framework for real-world multi-client learning with heterogeneous data distributions.

Abstract

Federated learning is an emerging paradigm for decentralized training of machine learning models on distributed clients, without revealing the data to the central server. Most existing works have focused on horizontal or vertical data distributions, where each client possesses different samples with shared features, or each client fully shares only sample indices, respectively. However, the hybrid scheme is much less studied, even though it is much more common in the real world. Therefore, in this paper, we propose a generalized algorithm, FedGraph, that introduces a graph convolutional neural network to capture feature-sharing information while learning features from a subset of clients. We also develop a simple but effective clustering algorithm that aggregates features produced by the deep neural networks of each client while preserving data privacy.
Paper Structure (12 sections, 9 equations, 5 figures, 2 tables)

This paper contains 12 sections, 9 equations, 5 figures, 2 tables.

Figures (5)

  • Figure 1: The data distribution patterns of (a) HFL, (b) VFL, and (c) HBFL.
  • Figure 2: Medical diagnosis: A use case example of HBFL
  • Figure 3: An overview of the training procedure for the proposed model includes: (a) training each client's model, (b) obtaining the optimal feature extractor from each client, and (c) training the server's GCN using embeddings produced by the clients' feature extractors.
  • Figure 4: An illustrative example of graph construction for a GCN based on a three-client scenario.
  • Figure 5: Validation results for Fashion MNIST regarding: (a) the number of hidden layers and (b) the cluster size $\delta$. In (b), we assess the accuracy of collaborative predictions made by the trained GCN.