An Efficient Learning Framework For Federated XGBoost Using Secret Sharing And Distributed Optimization
Lunchen Xie, Jiaqi Liu, Songtao Lu, Tsung-hui Chang, Qingjiang Shi
TL;DR
This paper tackles data isolation in vertical federated learning by introducing MP-FedXGB, a lossless, secret-sharing-based framework for multi-party XGBoost. It redesigns the non-linear components of XGBoost—split criterion and leaf weight calculation—into division-free, distributed optimization using a common denominator and a distributed quadratic program, respectively, enhanced by a SecureArgmax that avoids direct divisions. A security enhancement, First-Layer-Mask, mitigates potential instance-space leakage, and a thorough complexity analysis shows favorable computation vs HE-based alternatives. Empirical results on benchmark datasets demonstrate competitive performance with centralized XGBoost while preserving strong privacy guarantees and scalability to multiple participants. The work advances practical, privacy-preserving verticalFedXGB with improved efficiency and security for real-world multi-party collaborations.
Abstract
XGBoost is one of the most widely used machine learning models in the industry due to its superior learning accuracy and efficiency. Targeting at data isolation issues in the big data problems, it is crucial to deploy a secure and efficient federated XGBoost (FedXGB) model. Existing FedXGB models either have data leakage issues or are only applicable to the two-party setting with heavy communication and computation overheads. In this paper, a lossless multi-party federated XGB learning framework is proposed with a security guarantee, which reshapes the XGBoost's split criterion calculation process under a secret sharing setting and solves the leaf weight calculation problem by leveraging distributed optimization. Remarkably, a thorough analysis of model security is provided as well, and multiple numerical results showcase the superiority of the proposed FedXGB compared with the state-of-the-art models on benchmark datasets.
