ICAFS: Inter-Client-Aware Feature Selection for Vertical Federated Learning

Ruochen Jin; Boning Tong; Shu Yang; Bojian Hou; Li Shen

ICAFS: Inter-Client-Aware Feature Selection for Vertical Federated Learning

Ruochen Jin, Boning Tong, Shu Yang, Bojian Hou, Li Shen

TL;DR

ICAFS tackles the problem of inter-client feature selection in vertical federated learning without sharing private gradients. It introduces a three-stage framework that (i) generates label-aware synthetic data via a federated Wasserstein GAN, (ii) learns multiple gate-based embedding selectors on synthetic data and ensembles them, and (iii) applies the learned selectors to real data to refine predictions. The approach yields superior accuracy across multiple real-world datasets (including ADNI, USPS, ALLAML, and TOX_171), demonstrates scalability to more clients, and exhibits robustness to noise and privacy constraints. By decoupling feature selection from private gradients and leveraging inter-client feature interactions, ICAFS offers a practical, neural-network–friendly solution for privacy-preserving VFL FS.

Abstract

Vertical federated learning (VFL) enables a paradigm for vertically partitioned data across clients to collaboratively train machine learning models. Feature selection (FS) plays a crucial role in Vertical Federated Learning (VFL) due to the unique nature that data are distributed across multiple clients. In VFL, different clients possess distinct subsets of features for overlapping data samples, making the process of identifying and selecting the most relevant features a complex yet essential task. Previous FS efforts have primarily revolved around intra-client feature selection, overlooking vital feature interaction across clients, leading to subpar model outcomes. We introduce ICAFS, a novel multi-stage ensemble approach for effective FS in VFL by considering inter-client interactions. By employing conditional feature synthesis alongside multiple learnable feature selectors, ICAFS facilitates ensemble FS over these selectors using synthetic embeddings. This method bypasses the limitations of private gradient sharing and allows for model training using real data with refined embeddings. Experiments on multiple real-world datasets demonstrate that ICAFS surpasses current state-of-the-art methods in prediction accuracy.

ICAFS: Inter-Client-Aware Feature Selection for Vertical Federated Learning

TL;DR

Abstract

ICAFS: Inter-Client-Aware Feature Selection for Vertical Federated Learning

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (3)