Table of Contents
Fetching ...

Permutation-Invariant Representation Learning for Robust and Privacy-Preserving Feature Selection

Rui Liu, Tao Zhe, Yanjie Fu, Feng Xia, Ted Senator, Dongjie Wang

TL;DR

High-dimensional feature selection is challenged by complex feature interactions, permutation bias, and privacy constraints in distributed settings. The authors introduce CAPS, a centralized framework that combines permutation-invariant embeddings with policy-guided multi-objective search, and extend it to FedCAPS for privacy-preserving federated knowledge fusion and sample-aware weighting. The approach achieves strong or state-of-the-art performance across 14 datasets and demonstrates robustness across downstream tasks and federated scenarios. This work offers a practical, scalable pathway for identifying informative feature subsets without sharing raw data, with broad implications for secure and effective feature selection in real-world applications.

Abstract

Feature selection eliminates redundancy among features to improve downstream task performance while reducing computational overhead. Existing methods often struggle to capture intricate feature interactions and adapt across diverse application scenarios. Recent advances employ generative intelligence to alleviate these drawbacks. However, these methods remain constrained by permutation sensitivity in embedding and reliance on convexity assumptions in gradient-based search. To address these limitations, our initial work introduces a novel framework that integrates permutation-invariant embedding with policy-guided search. Although effective, it still left opportunities to adapt to realistic distributed scenarios. In practice, data across local clients is highly imbalanced, heterogeneous and constrained by strict privacy regulations, limiting direct sharing. These challenges highlight the need for a framework that can integrate feature selection knowledge across clients without exposing sensitive information. In this extended journal version, we advance the framework from two perspectives: 1) developing a privacy-preserving knowledge fusion strategy to derive a unified representation space without sharing sensitive raw data. 2) incorporating a sample-aware weighting strategy to address distributional imbalance among heterogeneous local clients. Extensive experiments validate the effectiveness, robustness, and efficiency of our framework. The results further demonstrate its strong generalization ability in federated learning scenarios. The code and data are publicly available: https://anonymous.4open.science/r/FedCAPS-08BF.

Permutation-Invariant Representation Learning for Robust and Privacy-Preserving Feature Selection

TL;DR

High-dimensional feature selection is challenged by complex feature interactions, permutation bias, and privacy constraints in distributed settings. The authors introduce CAPS, a centralized framework that combines permutation-invariant embeddings with policy-guided multi-objective search, and extend it to FedCAPS for privacy-preserving federated knowledge fusion and sample-aware weighting. The approach achieves strong or state-of-the-art performance across 14 datasets and demonstrates robustness across downstream tasks and federated scenarios. This work offers a practical, scalable pathway for identifying informative feature subsets without sharing raw data, with broad implications for secure and effective feature selection in real-world applications.

Abstract

Feature selection eliminates redundancy among features to improve downstream task performance while reducing computational overhead. Existing methods often struggle to capture intricate feature interactions and adapt across diverse application scenarios. Recent advances employ generative intelligence to alleviate these drawbacks. However, these methods remain constrained by permutation sensitivity in embedding and reliance on convexity assumptions in gradient-based search. To address these limitations, our initial work introduces a novel framework that integrates permutation-invariant embedding with policy-guided search. Although effective, it still left opportunities to adapt to realistic distributed scenarios. In practice, data across local clients is highly imbalanced, heterogeneous and constrained by strict privacy regulations, limiting direct sharing. These challenges highlight the need for a framework that can integrate feature selection knowledge across clients without exposing sensitive information. In this extended journal version, we advance the framework from two perspectives: 1) developing a privacy-preserving knowledge fusion strategy to derive a unified representation space without sharing sensitive raw data. 2) incorporating a sample-aware weighting strategy to address distributional imbalance among heterogeneous local clients. Extensive experiments validate the effectiveness, robustness, and efficiency of our framework. The results further demonstrate its strong generalization ability in federated learning scenarios. The code and data are publicly available: https://anonymous.4open.science/r/FedCAPS-08BF.

Paper Structure

This paper contains 27 sections, 11 equations, 10 figures, 2 tables.

Figures (10)

  • Figure 1: An overview of our centralized model CAPS. First, we develop an encoder-decoder module to learn a permutation-invariant embedding space by optimizing the reconstruction loss (a). Then, we explore the learned embedding space through policy-guided RL search, aiming to maximize downstream task performance and minimize feature subset length (b). (c) and (d) illustrate the architecture of induced set attention block (ISAB) and pooling by multihead attention (PMA) respectively.
  • Figure 2: An overview of our federated model FedCAPS.
  • Figure 3: The impact of data collection ($^{-c}$), permutation invariance ($^{-e}$) and RL search ($^{-p}$)(CAPS: (a)(b), FedCAPS: (c)(d)).
  • Figure 4: The visualization of original and permuted feature subset embeddings (CAPS: (a)(b), FedCAPS: (c)(d)).
  • Figure 5: Comparison of different downstream ML models in terms of Micro-F1 on UrbanSound (CAPS).
  • ...and 5 more figures