Table of Contents
Fetching ...

Knockoff-Guided Feature Selection via A Single Pre-trained Reinforced Agent

Xinyuan Wang, Dongjie Wang, Wangyang Ying, Rui Xie, Haifeng Chen, Yanjie Fu

TL;DR

This work introduces an innovative framework for feature selection, which is guided by knockoff features and optimized through reinforcement learning, to identify the optimal and effective feature subset.

Abstract

Feature selection prepares the AI-readiness of data by eliminating redundant features. Prior research falls into two primary categories: i) Supervised Feature Selection, which identifies the optimal feature subset based on their relevance to the target variable; ii) Unsupervised Feature Selection, which reduces the feature space dimensionality by capturing the essential information within the feature set instead of using target variable. However, SFS approaches suffer from time-consuming processes and limited generalizability due to the dependence on the target variable and downstream ML tasks. UFS methods are constrained by the deducted feature space is latent and untraceable. To address these challenges, we introduce an innovative framework for feature selection, which is guided by knockoff features and optimized through reinforcement learning, to identify the optimal and effective feature subset. In detail, our method involves generating "knockoff" features that replicate the distribution and characteristics of the original features but are independent of the target variable. Each feature is then assigned a pseudo label based on its correlation with all the knockoff features, serving as a novel metric for feature evaluation. Our approach utilizes these pseudo labels to guide the feature selection process in 3 novel ways, optimized by a single reinforced agent: 1). A deep Q-network, pre-trained with the original features and their corresponding pseudo labels, is employed to improve the efficacy of the exploration process in feature selection. 2). We introduce unsupervised rewards to evaluate the feature subset quality based on the pseudo labels and the feature space reconstruction loss to reduce dependencies on the target variable. 3). A new ε-greedy strategy is used, incorporating insights from the pseudo labels to make the feature selection process more effective.

Knockoff-Guided Feature Selection via A Single Pre-trained Reinforced Agent

TL;DR

This work introduces an innovative framework for feature selection, which is guided by knockoff features and optimized through reinforcement learning, to identify the optimal and effective feature subset.

Abstract

Feature selection prepares the AI-readiness of data by eliminating redundant features. Prior research falls into two primary categories: i) Supervised Feature Selection, which identifies the optimal feature subset based on their relevance to the target variable; ii) Unsupervised Feature Selection, which reduces the feature space dimensionality by capturing the essential information within the feature set instead of using target variable. However, SFS approaches suffer from time-consuming processes and limited generalizability due to the dependence on the target variable and downstream ML tasks. UFS methods are constrained by the deducted feature space is latent and untraceable. To address these challenges, we introduce an innovative framework for feature selection, which is guided by knockoff features and optimized through reinforcement learning, to identify the optimal and effective feature subset. In detail, our method involves generating "knockoff" features that replicate the distribution and characteristics of the original features but are independent of the target variable. Each feature is then assigned a pseudo label based on its correlation with all the knockoff features, serving as a novel metric for feature evaluation. Our approach utilizes these pseudo labels to guide the feature selection process in 3 novel ways, optimized by a single reinforced agent: 1). A deep Q-network, pre-trained with the original features and their corresponding pseudo labels, is employed to improve the efficacy of the exploration process in feature selection. 2). We introduce unsupervised rewards to evaluate the feature subset quality based on the pseudo labels and the feature space reconstruction loss to reduce dependencies on the target variable. 3). A new ε-greedy strategy is used, incorporating insights from the pseudo labels to make the feature selection process more effective.
Paper Structure (30 sections, 10 equations, 16 figures, 1 table)

This paper contains 30 sections, 10 equations, 16 figures, 1 table.

Figures (16)

  • Figure 1: The Unsupervised Feature Selection Structure.
  • Figure 2: The Overall Structure. We employ a single agent to make decisions for each feature. Utilizing the Knockoff information, along with an autoencoder, our innovative structure incorporates four key elements: 1) Pre-training: We pre-train the decision network of the Deep Q-Network using feature labels to guide the agent in making decisions for each feature. 2) $\varepsilon$-Greedy Policy: In addition to random exploration, we use feature labels to guide the agent in the $\varepsilon$-greedy policy. 3) Knockoff Reward: We compare the agent's decisions with feature labels to generate an unsupervised reward function. 4) Matrix Reconstruction Reward: We obtain representation vectors for both the original feature set and the feature subset, calculating an unsupervised matrix reconstruction reward.
  • Figure 3: The Structure of Pre-train Stage and Decision-making Stage.
  • Figure 4: The Structure of Renewed $\varepsilon$-greedy Policy.
  • Figure 5: The Matrix Reconstruction Structure. The difference between two feature sets is used to construct the reward function, two Auto-Encoders are trained separately and then generate representation vectors of two feature sets respectively.
  • ...and 11 more figures