MINERVA: Mutual Information Neural Estimation for Supervised Feature Selection
Taurai Muvunza, Egor Kraev, Pere Planell-Morell, Alexander Y. Shestopaloff
TL;DR
MINERVA advances supervised feature selection by learning a neural mutual information estimator to quantify the joint dependence between features and the target, and then applying sparsity-inducing regularizers in a two-stage training scheme. By leveraging the DV dual representation of KL-divergence and a flexible neural architecture, it can capture higher-order interactions that traditional pairwise filters miss. Empirical results on synthetic tasks demonstrate exact feature recovery, while real-world fraud data show superior out-of-sample performance when using MINERVA-selected features, validating its practical impact for high-stakes, imbalanced problems. The framework provides a scalable, interpretable filter with strong theoretical grounding and accessible implementation, enabling broader application across domains demanding reliable feature selection under complex dependencies.
Abstract
Existing feature filters rely on statistical pair-wise dependence metrics to model feature-target relationships, but this approach may fail when the target depends on higher-order feature interactions rather than individual contributions. We introduce Mutual Information Neural Estimation Regularized Vetting Algorithm (MINERVA), a novel approach to supervised feature selection based on neural estimation of mutual information between features and targets. We paramaterize the approximation of mutual information with neural networks and perform feature selection using a carefully designed loss function augmented with sparsity-inducing regularizers. Our method is implemented in a two-stage process to decouple representation learning from feature selection, ensuring better generalization and a more accurate expression of feature importance. We present examples of ubiquitous dependency structures that are rarely captured in literature and show that our proposed method effectively captures these complex feature-target relationships by evaluating feature subsets as an ensemble. Experimental results on synthetic and real-life fraud datasets demonstrate the efficacy of our method and its ability to perform exact solutions.
