Effective and Efficient Federated Tree Learning on Hybrid Data
Qinbin Li, Chulin Xie, Xiaojun Xu, Xiaoyuan Liu, Ce Zhang, Bo Li, Bingsheng He, Dawn Song
TL;DR
HybridTree addresses federated learning on hybrid data where parties hold different features and samples by introducing a meta-rule–driven tree transformation and a layer-level training strategy that appends guest-specific layers. The method enables efficient knowledge integration with limited communication, achieving accuracy close to centralized training while delivering substantial speedups over node-level baselines. Extensive experiments on synthetic and real tabular data demonstrate strong performance and scalability under heterogeneity and multi-party settings. The approach offers a practical privacy-preserving solution for hybrid FL and suggests avenues for extending to multi-modal data.
Abstract
Federated learning has emerged as a promising distributed learning paradigm that facilitates collaborative learning among multiple parties without transferring raw data. However, most existing federated learning studies focus on either horizontal or vertical data settings, where the data of different parties are assumed to be from the same feature or sample space. In practice, a common scenario is the hybrid data setting, where data from different parties may differ both in the features and samples. To address this, we propose HybridTree, a novel federated learning approach that enables federated tree learning on hybrid data. We observe the existence of consistent split rules in trees. With the help of these split rules, we theoretically show that the knowledge of parties can be incorporated into the lower layers of a tree. Based on our theoretical analysis, we propose a layer-level solution that does not need frequent communication traffic to train a tree. Our experiments demonstrate that HybridTree can achieve comparable accuracy to the centralized setting with low computational and communication overhead. HybridTree can achieve up to 8 times speedup compared with the other baselines.
