Table of Contents
Fetching ...

Effective and Efficient Federated Tree Learning on Hybrid Data

Qinbin Li, Chulin Xie, Xiaojun Xu, Xiaoyuan Liu, Ce Zhang, Bo Li, Bingsheng He, Dawn Song

TL;DR

HybridTree addresses federated learning on hybrid data where parties hold different features and samples by introducing a meta-rule–driven tree transformation and a layer-level training strategy that appends guest-specific layers. The method enables efficient knowledge integration with limited communication, achieving accuracy close to centralized training while delivering substantial speedups over node-level baselines. Extensive experiments on synthetic and real tabular data demonstrate strong performance and scalability under heterogeneity and multi-party settings. The approach offers a practical privacy-preserving solution for hybrid FL and suggests avenues for extending to multi-modal data.

Abstract

Federated learning has emerged as a promising distributed learning paradigm that facilitates collaborative learning among multiple parties without transferring raw data. However, most existing federated learning studies focus on either horizontal or vertical data settings, where the data of different parties are assumed to be from the same feature or sample space. In practice, a common scenario is the hybrid data setting, where data from different parties may differ both in the features and samples. To address this, we propose HybridTree, a novel federated learning approach that enables federated tree learning on hybrid data. We observe the existence of consistent split rules in trees. With the help of these split rules, we theoretically show that the knowledge of parties can be incorporated into the lower layers of a tree. Based on our theoretical analysis, we propose a layer-level solution that does not need frequent communication traffic to train a tree. Our experiments demonstrate that HybridTree can achieve comparable accuracy to the centralized setting with low computational and communication overhead. HybridTree can achieve up to 8 times speedup compared with the other baselines.

Effective and Efficient Federated Tree Learning on Hybrid Data

TL;DR

HybridTree addresses federated learning on hybrid data where parties hold different features and samples by introducing a meta-rule–driven tree transformation and a layer-level training strategy that appends guest-specific layers. The method enables efficient knowledge integration with limited communication, achieving accuracy close to centralized training while delivering substantial speedups over node-level baselines. Extensive experiments on synthetic and real tabular data demonstrate strong performance and scalability under heterogeneity and multi-party settings. The approach offers a practical privacy-preserving solution for hybrid FL and suggests avenues for extending to multi-modal data.

Abstract

Federated learning has emerged as a promising distributed learning paradigm that facilitates collaborative learning among multiple parties without transferring raw data. However, most existing federated learning studies focus on either horizontal or vertical data settings, where the data of different parties are assumed to be from the same feature or sample space. In practice, a common scenario is the hybrid data setting, where data from different parties may differ both in the features and samples. To address this, we propose HybridTree, a novel federated learning approach that enables federated tree learning on hybrid data. We observe the existence of consistent split rules in trees. With the help of these split rules, we theoretically show that the knowledge of parties can be incorporated into the lower layers of a tree. Based on our theoretical analysis, we propose a layer-level solution that does not need frequent communication traffic to train a tree. Our experiments demonstrate that HybridTree can achieve comparable accuracy to the centralized setting with low computational and communication overhead. HybridTree can achieve up to 8 times speedup compared with the other baselines.
Paper Structure (43 sections, 4 theorems, 8 equations, 9 figures, 13 tables, 2 algorithms)

This paper contains 43 sections, 4 theorems, 8 equations, 9 figures, 13 tables, 2 algorithms.

Key Result

Theorem 2

Suppose $F_g$ is a meta-rule in Tree A. For any input instance $\textbf{x} \in \mathcal{D}$, we have $E[f(\textbf{x};\theta_A)] = E[f(\textbf{x};\theta_B)]$, i.e., the expectation of prediction value of Tree A and Tree B are the same.

Figures (9)

  • Figure 1: Hybrid data partitioning.
  • Figure 2: Two examples of meta-rules. $F$ is the split condition and $L$ is the leaf value. In (a), $F_g\rightarrow L_1$ exists in both trees. In (b), $\neg F_{h_1}\rightarrow F_g \rightarrow L_2$ exists in both trees.
  • Figure 3: (a) The proportion of trees that have the same meta-rules. (b) $F_g$ is a split rule with the split feature from guests. $F_h$ is a split rule with the split feature from the host. $L$ represents leaf nodes.
  • Figure 4: A comparison between node-level solution (a) and our layer-level solution (b). All parties jointly update each node in (a) while each party only updates a segmented tree individually in (b).
  • Figure 5: The inference process of HybridTree.
  • ...and 4 more figures

Theorems & Definitions (8)

  • Definition 1
  • Theorem 2
  • Theorem 3
  • Definition 1
  • Theorem 2
  • proof
  • Theorem 3
  • proof