Causal Multi-Label Feature Selection in Federated Setting
Yukun Song, Dayuan Cao, Jiali Miao, Shuai Yang, Kui Yu
TL;DR
FedCMFS tackles causal multi-label feature selection under data privacy constraints by introducing a horizontal federated framework with three subroutines: FedCFL learns local causal parents and children for each label, FedCFR retrieves potentially missed causal features, and FedCFC corrects false positives using DAG symmetry. The approach aggregates local CI results with client-weighted averages, enabling global PC(Y) construction without sharing raw data. Empirical results across eight real datasets and six metrics show FedCMFS achieving the best average ranking, performing especially well on high-dimensional data, and benefiting from GPU-accelerated CI tests to reduce runtime. The work advances privacy-preserving causal feature selection in federated, multi-label scenarios and suggests future work for improving performance in small-sample regimes.
Abstract
Multi-label feature selection serves as an effective mean for dealing with high-dimensional multi-label data. To achieve satisfactory performance, existing methods for multi-label feature selection often require the centralization of substantial data from multiple sources. However, in Federated setting, centralizing data from all sources and merging them into a single dataset is not feasible. To tackle this issue, in this paper, we study a challenging problem of causal multi-label feature selection in federated setting and propose a Federated Causal Multi-label Feature Selection (FedCMFS) algorithm with three novel subroutines. Specifically, FedCMFS first uses the FedCFL subroutine that considers the correlations among label-label, label-feature, and feature-feature to learn the relevant features (candidate parents and children) of each class label while preserving data privacy without centralizing data. Second, FedCMFS employs the FedCFR subroutine to selectively recover the missed true relevant features. Finally, FedCMFS utilizes the FedCFC subroutine to remove false relevant features. The extensive experiments on 8 datasets have shown that FedCMFS is effect for causal multi-label feature selection in federated setting.
