FedHK-MVFC: Federated Heat Kernel Multi-View Clustering
Kristina P. Sinaga
TL;DR
This work introduces HK-MVFC, a heat-kernel based multi-view fuzzy clustering framework, and its federated extension FedHK-MVFC for privacy-preserving collaboration across distributed healthcare sites. By replacing Euclidean distances with geometry-aware kernel distances derived from heat-kernel coefficients, the method captures intrinsic data manifold structure and supports adaptive view weighting to handle view heterogeneity. Theoretical contributions include convergence guarantees, adaptive weighting analysis, and privacy-preserving protocols (differential privacy and secure aggregation) within a federated learning setting. Experiments on synthetic multi-view cardiovascular data show improved clustering accuracy, reduced communication, and robust performance under data heterogeneity, illustrating practical potential for collaborative phenotyping while safeguarding sensitive medical information. The framework further outlines real-world extensions to healthcare, Future Internet scenarios, and cross-domain applications, highlighting the balance between geometric fidelity, privacy, and scalability.
Abstract
In the realm of distributed artificial intelligence (AI) and privacy-focused medical applications, this paper proposes a multi-view clustering framework that links quantum field theory with federated healthcare analytics. The method uses heat kernel coefficients from spectral analysis to convert Euclidean distances into geometry-aware similarity measures that capture the structure of diverse medical data. The framework is presented through the heat kernel distance (HKD) transformation, which has convergence guarantees. Two algorithms have been developed: The first, Heat Kernel-Enhanced Multi-View Fuzzy Clustering (HK-MVFC), is used for central analysis. The second, Federated Heat Kernel Multi-View Fuzzy Clustering (FedHK-MVFC), is used for secure, privacy-preserving learning across hospitals. FedHK-MVFC uses differential privacy and secure aggregation to enable HIPAA-compliant collaboration. Tests on synthetic cardiovascular patient datasets demonstrate increased clustering accuracy, reduced communication, and retained efficiency compared to centralized methods. After being validated on 10,000 synthetic patient records across two hospitals, the methods proved useful for collaborative phenotyping involving electrocardiogram (ECG) data, cardiac imaging data, and behavioral data. The proposed methods' theoretical contributions include update rules with proven convergence, adaptive view weighting, and privacy-preserving protocols. These contributions establish a new standard for geometry-aware federated learning in healthcare, translating advanced mathematics into practical solutions for analyzing sensitive medical data while ensuring rigor and clinical relevance.
