PPFS: Predictive Permutation Feature Selection
Atif Hassan, Jiaul H. Paik, Swanand Khare, Syed Asif Hassan
TL;DR
PPFS addresses universal feature selection by learning the Markov Blanket of a target $Y$ through a wrapper approach that handles mixed-type features and supports both classification and regression. It introduces the Predictive Permutation Independence (PPI) test, a knockoff-based, non-parametric CI test that leverages supervised learners to compare predictive risk with knockoffs, guiding MB discovery via a Growth/Shrink procedure and a $K$-fold MB aggregation. The method yields accurate, compact MBs and demonstrates improved predictive performance and feature reduction across 12 datasets compared with state-of-the-art MB discovery and wrapper methods, supported by a sketch of correctness. An open-source implementation is provided to facilitate adoption and further research.
Abstract
We propose Predictive Permutation Feature Selection (PPFS), a novel wrapper-based feature selection method based on the concept of Markov Blanket (MB). Unlike previous MB methods, PPFS is a universal feature selection technique as it can work for both classification as well as regression tasks on datasets containing categorical and/or continuous features. We propose Predictive Permutation Independence (PPI), a new Conditional Independence (CI) test, which enables PPFS to be categorised as a wrapper feature selection method. This is in contrast to current filter based MB feature selection techniques that are unable to harness the advancements in supervised algorithms such as Gradient Boosting Machines (GBM). The PPI test is based on the knockoff framework and utilizes supervised algorithms to measure the association between an individual or a set of features and the target variable. We also propose a novel MB aggregation step that addresses the issue of sample inefficiency. Empirical evaluations and comparisons on a large number of datasets demonstrate that PPFS outperforms state-of-the-art Markov blanket discovery algorithms as well as, well-known wrapper methods. We also provide a sketch of the proof of correctness of our method. Implementation of this work is available at \url{https://github.com/atif-hassan/PyImpetus}
