TreeCSS: An Efficient Framework for Vertical Federated Learning

Qinbo Zhang; Xiao Yan; Yukai Ding; Quanqing Xu; Chuang Hu; Xiaokai Zhou; Jiawei Jiang

TreeCSS: An Efficient Framework for Vertical Federated Learning

Qinbo Zhang, Xiao Yan, Yukai Ding, Quanqing Xu, Chuang Hu, Xiaokai Zhou, Jiawei Jiang

TL;DR

TreeCSS tackles the core scalability bottlenecks of vertical federated learning by combining Tree-MPSI for scalable data alignment with a clustering-based coreset strategy for training. The Tree-MPSI component reduces interaction rounds via a tree structure and volume-aware scheduling, while Cluster-Coreset uses K-Means clustering across participants, encrypted cluster tuples, and sample weighting to produce a compact, informative training set. Across six datasets and multiple models, TreeCSS achieves up to $2.93\times$ end-to-end speedups with comparable accuracy to vanilla VFL, while preserving privacy through homomorphic encryption. The approach generalizes across tasks and models, reducing data and communication burdens and enabling scalable, privacy-preserving VFL in practical deployments.

Abstract

Vertical federated learning (VFL) considers the case that the features of data samples are partitioned over different participants. VFL consists of two main steps, i.e., identify the common data samples for all participants (alignment) and train model using the aligned data samples (training). However, when there are many participants and data samples, both alignment and training become slow. As such, we propose TreeCSS as an efficient VFL framework that accelerates the two main steps. In particular, for sample alignment, we design an efficient multi-party private set intersection (MPSI) protocol called Tree-MPSI, which adopts a tree-based structure and a data-volume-aware scheduling strategy to parallelize alignment among the participants. As model training time scales with the number of data samples, we conduct coreset selection (CSS) to choose some representative data samples for training. Our CCS method adopts a clustering-based scheme for security and generality, which first clusters the features locally on each participant and then merges the local clustering results to select representative samples. In addition, we weight the samples according to their distances to the centroids to reflect their importance to model training. We evaluate the effectiveness and efficiency of our TreeCSS framework on various datasets and models. The results show that compared with vanilla VFL, TreeCSS accelerates training by up to 2.93x and achieves comparable model accuracy.

TreeCSS: An Efficient Framework for Vertical Federated Learning

TL;DR

end-to-end speedups with comparable accuracy to vanilla VFL, while preserving privacy through homomorphic encryption. The approach generalizes across tasks and models, reducing data and communication burdens and enabling scalable, privacy-preserving VFL in practical deployments.

Abstract

Paper Structure (12 sections, 5 equations, 7 figures, 2 tables)

This paper contains 12 sections, 5 equations, 7 figures, 2 tables.

Introduction
Related Work
Preliminaries
The TreeCSS Framework
Tree-MPSI for Data Alignment
Cluster-Coreset for Coreset Construction
Experimental Evaluation
Experiment Settings
End-to-end Performance
Evaluation of Tree-MPSI and Cluster-Coreset
Ablation and Sensitivity Study
Conclusions

Figures (7)

Figure 1: An illustration of our TreeCSS framework.
Figure 2: An illustration of Tree-MPSI for data alginement.
Figure 3: An illustration of Cluster-Coreset for coreset construction.
Figure 4: Effect of cluster size and re-weighting on model quality.
Figure 5: Effect of cluster size and re-weighting on runtime.
...and 2 more figures

TreeCSS: An Efficient Framework for Vertical Federated Learning

TL;DR

Abstract

TreeCSS: An Efficient Framework for Vertical Federated Learning

Authors

TL;DR

Abstract

Table of Contents

Figures (7)