Federated Classification in Hyperbolic Spaces via Secure Aggregation of Convex Hulls
Saurav Prakash, Jin Sima, Chao Pan, Eli Chien, Olgica Milenkovic
TL;DR
The paper tackles learning on tree-like data by leveraging hyperbolic geometry to enable low-distortion embeddings, aiming to perform privacy-preserving federated classification across distributed biomedical datasets. It proposes a one-shot federated SVM in the Poincaré disc that communicates minimal convex hull information, resolves label switching with B_h sequences, and uses Poincaré quantization with Reed-Solomon–like encoding for secure transmission, followed by balanced graph partitioning to aggregate hulls at the server. The key contributions include a hyperbolic Graham scan for convex hulls, ε-Poincaré quantization, convex hull complexity and privacy leakage analysis, B_h-based label encoding, secure SCMA transmission, and a graph-partitioning server aggregator enabling accurate global SVM learning. Experiments on synthetic and single-cell RNA-seq datasets demonstrate that the federated hyperbolic approach can outperform Euclidean federated methods and approach centralized performance, highlighting the practical impact for privacy-preserving learning on hierarchical biological data.
Abstract
Hierarchical and tree-like data sets arise in many applications, including language processing, graph data mining, phylogeny and genomics. It is known that tree-like data cannot be embedded into Euclidean spaces of finite dimension with small distortion. This problem can be mitigated through the use of hyperbolic spaces. When such data also has to be processed in a distributed and privatized setting, it becomes necessary to work with new federated learning methods tailored to hyperbolic spaces. As an initial step towards the development of the field of federated learning in hyperbolic spaces, we propose the first known approach to federated classification in hyperbolic spaces. Our contributions are as follows. First, we develop distributed versions of convex SVM classifiers for Poincaré discs. In this setting, the information conveyed from clients to the global classifier are convex hulls of clusters present in individual client data. Second, to avoid label switching issues, we introduce a number-theoretic approach for label recovery based on the so-called integer $B_h$ sequences. Third, we compute the complexity of the convex hulls in hyperbolic spaces to assess the extent of data leakage; at the same time, in order to limit communication cost for the hulls, we propose a new quantization method for the Poincaré disc coupled with Reed-Solomon-like encoding. Fourth, at the server level, we introduce a new approach for aggregating convex hulls of the clients based on balanced graph partitioning. We test our method on a collection of diverse data sets, including hierarchical single-cell RNA-seq data from different patients distributed across different repositories that have stringent privacy constraints. The classification accuracy of our method is up to $\sim 11\%$ better than its Euclidean counterpart, demonstrating the importance of privacy-preserving learning in hyperbolic spaces.
