Federated Incomplete Multi-View Clustering with Heterogeneous Graph Neural Networks
Xueming Yan, Ziqi Wang, Yaochu Jin
TL;DR
Federated incomplete multi-view clustering under data heterogeneity and privacy constraints is addressed by FIM-GNNs, which deploy heterogeneous GNNs (GCN/GAT) at each client to extract view-specific features and use a server-side aggregation of overlapping samples to form a global representation. The model optimizes a joint loss $L = L_r + \gamma L_c$, where $L_r$ is a reconstruction loss from a graph autoencoder and $L_c$ is a KL-based clustering loss against globally updated pseudo-labels $P$. A global pseudo-label mechanism, coupled with weighted aggregation across heterogeneous views and Hungarian alignment, enables consistent clustering across incomplete views. Empirical results on Caltech-7 and BDGP demonstrate competitive or superior performance relative to state-of-the-art incomplete MVC methods, validating the approach under privacy-preserving federated settings and incomplete data.
Abstract
Federated multi-view clustering offers the potential to develop a global clustering model using data distributed across multiple devices. However, current methods face challenges due to the absence of label information and the paramount importance of data privacy. A significant issue is the feature heterogeneity across multi-view data, which complicates the effective mining of complementary clustering information. Additionally, the inherent incompleteness of multi-view data in a distributed setting can further complicate the clustering process. To address these challenges, we introduce a federated incomplete multi-view clustering framework with heterogeneous graph neural networks (FIM-GNNs). In the proposed FIM-GNNs, autoencoders built on heterogeneous graph neural network models are employed for feature extraction of multi-view data at each client site. At the server level, heterogeneous features from overlapping samples of each client are aggregated into a global feature representation. Global pseudo-labels are generated at the server to enhance the handling of incomplete view data, where these labels serve as a guide for integrating and refining the clustering process across different data views. Comprehensive experiments have been conducted on public benchmark datasets to verify the performance of the proposed FIM-GNNs in comparison with state-of-the-art algorithms.
