Hyperparameter Optimization for SecureBoost via Constrained Multi-Objective Federated Learning
Yan Kang, Ziyao Ren, Lixin Fan, Linghua Yang, Yongxin Tong, Qiang Yang
TL;DR
The paper tackles privacy leakage and suboptimal hyperparameter choices in SecureBoost within vertical federated learning. It introduces Instance Clustering Attack (ICA) to quantify label leakage and two defenses (Local Trees and Purity Threshold) to mitigate it. Building on NSGA-II, the Constrained Multi-Objective SecureBoost (CMOSB) algorithm optimizes three objectives—utility loss $\epsilon_u$, training cost $\epsilon_c$, and privacy leakage $\epsilon_p$—while enforcing constraints to yield Pareto-optimal hyperparameters. Experiments on four datasets show CMOSB outperforms grid search and Bayesian optimization, delivering better trade-offs between privacy, utility, and efficiency, with clear practical implications for trustworthy VFL systems.
Abstract
SecureBoost is a tree-boosting algorithm that leverages homomorphic encryption (HE) to protect data privacy in vertical federated learning. SecureBoost and its variants have been widely adopted in fields such as finance and healthcare. However, the hyperparameters of SecureBoost are typically configured heuristically for optimizing model performance (i.e., utility) solely, assuming that privacy is secured. Our study found that SecureBoost and some of its variants are still vulnerable to label leakage. This vulnerability may lead the current heuristic hyperparameter configuration of SecureBoost to a suboptimal trade-off between utility, privacy, and efficiency, which are pivotal elements toward a trustworthy federated learning system. To address this issue, we propose the Constrained Multi-Objective SecureBoost (CMOSB) algorithm, which aims to approximate Pareto optimal solutions that each solution is a set of hyperparameters achieving an optimal trade-off between utility loss, training cost, and privacy leakage. We design measurements of the three objectives, including a novel label inference attack named instance clustering attack (ICA) to measure the privacy leakage of SecureBoost. Additionally, we provide two countermeasures against ICA. The experimental results demonstrate that the CMOSB yields superior hyperparameters over those optimized by grid search and Bayesian optimization regarding the trade-off between utility loss, training cost, and privacy leakage.
