Statistical Hypothesis Testing for Information Value (IV)
Helder Rojas, Cirilo Alvarez, Nilton Rojas
TL;DR
This work addresses the lack of statistical justification for fixed IV thresholds by linking Information Value to the Jeffreys divergence and introducing a nonparametric J-Divergence test with solid asymptotic guarantees. The authors establish almost-sure consistency and asymptotic normality for the IV estimator, derive a normal-based test statistic, and validate performance via simulations and a real fraud-detection dataset. The approach yields a robust, interpretable pre-modeling filter that outperform traditional IV thresholding in imbalanced settings, and is complemented by an open-source Python library for practical adoption. Overall, the J-Divergence test offers a principled, model-agnostic alternative for feature selection prior to modeling, with clear pathways for extension to multinomial and integrated pipelines.
Abstract
Information Value (IV) is a widely used technique for feature selection prior to the modeling phase, particularly in credit scoring and related domains. However, conventional IV-based practices rely on fixed empirical thresholds, which lack statistical justification and may be sensitive to characteristics such as class imbalance. In this work, we develop a formal statistical framework for IV by establishing its connection with Jeffreys divergence and propose a novel nonparametric hypothesis test, referred to as the J-Divergence test. Our method provides rigorous asymptotic guarantees and enables interpretable decisions based on \(p\)-values. Numerical experiments, including synthetic and real-world data, demonstrate that the proposed test is more reliable than traditional IV thresholding, particularly under strong imbalance. The test is model-agnostic, computationally efficient, and well-suited for the pre-modeling phase in high-dimensional or imbalanced settings. An open-source Python library is provided for reproducibility and practical adoption.
