Table of Contents
Fetching ...

Embedded Federated Feature Selection with Dynamic Sparse Training: Balancing Accuracy-Cost Tradeoffs

Afsaneh Mahanipour, Hana Khamfroush

TL;DR

This work tackles the cost bottlenecks of horizontal federated learning on high-dimensional data by introducing DSFFS, an embedded federated feature selection method integrated with federated dynamic sparse training. DSFFS dynamically prunes and regrows input-layer neurons and connections while maintaining a fixed sparsity, enabling selection of $K$ informative features and reducing both communication and computation costs. Across nine real-world datasets, DSFFS with FedDST (and FedDST+FedProx) achieves superior accuracy-cost trade-offs, outperforming state-of-the-art FFS baselines and reducing FLOPs and upload costs in many settings. This approach offers a practical pathway to scalable, privacy-preserving FL on heterogeneous, resource-constrained edge devices.

Abstract

Federated Learning (FL) enables multiple resource-constrained edge devices with varying levels of heterogeneity to collaboratively train a global model. However, devices with limited capacity can create bottlenecks and slow down model convergence. One effective approach to addressing this issue is to use an efficient feature selection method, which reduces overall resource demands by minimizing communication and computation costs, thereby mitigating the impact of struggling nodes. Existing federated feature selection (FFS) methods are either considered as a separate step from FL or rely on a third party. These approaches increase computation and communication overhead, making them impractical for real-world high-dimensional datasets. To address this, we present \textit{Dynamic Sparse Federated Feature Selection} (DSFFS), the first innovative embedded FFS that is efficient in both communication and computation. In the proposed method, feature selection occurs simultaneously with model training. During training, input-layer neurons, their connections, and hidden-layer connections are dynamically pruned and regrown, eliminating uninformative features. This process enhances computational efficiency on devices, improves network communication efficiency, and boosts global model performance. Several experiments are conducted on nine real-world datasets of varying dimensionality from diverse domains, including biology, image, speech, and text. The results under a realistic non-iid data distribution setting show that our approach achieves a better trade-off between accuracy, computation, and communication costs by selecting more informative features compared to other state-of-the-art FFS methods.

Embedded Federated Feature Selection with Dynamic Sparse Training: Balancing Accuracy-Cost Tradeoffs

TL;DR

This work tackles the cost bottlenecks of horizontal federated learning on high-dimensional data by introducing DSFFS, an embedded federated feature selection method integrated with federated dynamic sparse training. DSFFS dynamically prunes and regrows input-layer neurons and connections while maintaining a fixed sparsity, enabling selection of informative features and reducing both communication and computation costs. Across nine real-world datasets, DSFFS with FedDST (and FedDST+FedProx) achieves superior accuracy-cost trade-offs, outperforming state-of-the-art FFS baselines and reducing FLOPs and upload costs in many settings. This approach offers a practical pathway to scalable, privacy-preserving FL on heterogeneous, resource-constrained edge devices.

Abstract

Federated Learning (FL) enables multiple resource-constrained edge devices with varying levels of heterogeneity to collaboratively train a global model. However, devices with limited capacity can create bottlenecks and slow down model convergence. One effective approach to addressing this issue is to use an efficient feature selection method, which reduces overall resource demands by minimizing communication and computation costs, thereby mitigating the impact of struggling nodes. Existing federated feature selection (FFS) methods are either considered as a separate step from FL or rely on a third party. These approaches increase computation and communication overhead, making them impractical for real-world high-dimensional datasets. To address this, we present \textit{Dynamic Sparse Federated Feature Selection} (DSFFS), the first innovative embedded FFS that is efficient in both communication and computation. In the proposed method, feature selection occurs simultaneously with model training. During training, input-layer neurons, their connections, and hidden-layer connections are dynamically pruned and regrown, eliminating uninformative features. This process enhances computational efficiency on devices, improves network communication efficiency, and boosts global model performance. Several experiments are conducted on nine real-world datasets of varying dimensionality from diverse domains, including biology, image, speech, and text. The results under a realistic non-iid data distribution setting show that our approach achieves a better trade-off between accuracy, computation, and communication costs by selecting more informative features compared to other state-of-the-art FFS methods.

Paper Structure

This paper contains 15 sections, 8 equations, 6 figures, 2 tables, 1 algorithm.

Figures (6)

  • Figure 1: Test accuracy of FedDST on an artificial dataset with original and noisy features.
  • Figure 2: Overview of the proposed method DSFFS for embedded-based federated feature selection.
  • Figure 3: Test accuracy vs. Cumulative upload cost on non-iid $\texttt{MNIST}$.
  • Figure 4: Test accuracy vs. Cumulative upload cost on non-iid $\texttt{Fashion-MNIST}$.
  • Figure 5: Test accuracy vs. Cumulative upload cost on non-iid $\texttt{COIL-20}$.
  • ...and 1 more figures