Table of Contents
Fetching ...

Statistical Test for Auto Feature Engineering by Selective Inference

Tatsuya Matsukawa, Tomohiro Shiraishi, Shuichi Nishino, Teruyuki Katsuoka, Ichiro Takeuchi

TL;DR

A new statistical test for generated features by AFE algorithms based on a framework called selective inference can quantify the statistical significance of the generated features in the form of $p$-values, enabling theoretically guaranteed control of the risk of false findings.

Abstract

Auto Feature Engineering (AFE) plays a crucial role in developing practical machine learning pipelines by automating the transformation of raw data into meaningful features that enhance model performance. By generating features in a data-driven manner, AFE enables the discovery of important features that may not be apparent through human experience or intuition. On the other hand, since AFE generates features based on data, there is a risk that these features may be overly adapted to the data, making it essential to assess their reliability appropriately. Unfortunately, because most AFE problems are formulated as combinatorial search problems and solved by heuristic algorithms, it has been challenging to theoretically quantify the reliability of generated features. To address this issue, we propose a new statistical test for generated features by AFE algorithms based on a framework called selective inference. As a proof of concept, we consider a simple class of tree search-based heuristic AFE algorithms, and consider the problem of testing the generated features when they are used in a linear model. The proposed test can quantify the statistical significance of the generated features in the form of $p$-values, enabling theoretically guaranteed control of the risk of false findings.

Statistical Test for Auto Feature Engineering by Selective Inference

TL;DR

A new statistical test for generated features by AFE algorithms based on a framework called selective inference can quantify the statistical significance of the generated features in the form of -values, enabling theoretically guaranteed control of the risk of false findings.

Abstract

Auto Feature Engineering (AFE) plays a crucial role in developing practical machine learning pipelines by automating the transformation of raw data into meaningful features that enhance model performance. By generating features in a data-driven manner, AFE enables the discovery of important features that may not be apparent through human experience or intuition. On the other hand, since AFE generates features based on data, there is a risk that these features may be overly adapted to the data, making it essential to assess their reliability appropriately. Unfortunately, because most AFE problems are formulated as combinatorial search problems and solved by heuristic algorithms, it has been challenging to theoretically quantify the reliability of generated features. To address this issue, we propose a new statistical test for generated features by AFE algorithms based on a framework called selective inference. As a proof of concept, we consider a simple class of tree search-based heuristic AFE algorithms, and consider the problem of testing the generated features when they are used in a linear model. The proposed test can quantify the statistical significance of the generated features in the form of -values, enabling theoretically guaranteed control of the risk of false findings.

Paper Structure

This paper contains 33 sections, 2 theorems, 25 equations, 11 figures, 1 table, 2 algorithms.

Key Result

Theorem 1

Consider a fixed design matrix $X$, a random response vector $\bm{Y}\sim\mathcal{N}(\bm{\mu}, \Sigma)$ and an observed response vector $\bm{y}$. Let $\mathcal{F}_{\bm{Y}}$ and $\mathcal{F}_{\bm{y}}$ be the obtained set of generated features, by applying an auto feature engineering algorithm in the f is a truncated normal distribution $\mathrm{TN}(\bm{\eta}^\top\bm{\mu}, \bm{\eta}^\top\Sigma\bm{\et

Figures (11)

  • Figure 1: Overview of the proposed framework for evaluating the significance of features generated by the AFE algorithm. The input dataset $(X,\bm{y})$ is processed by an AFE algorithm to generate a set of features $\mathcal{F}$. The significance of each generated feature is then evaluated using a proposed selective inference method. Note that directly applying traditional statistical tests to generated features can lead to inflated type I error rates, as these features are data-dependent.
  • Figure 2: Schematic illustration of the feature generation process as a directed tree search.
  • Figure 3: Schematic illustration of the AFE algorithm. The search proceeds roughly as follows: for each increase in depth, $N(=4)$ nodes are randomly generated (each node has a feature set that adds one new feature to the parent node). Here, $V_i^j$ denotes a node, with the subscript $i$ denoting the depth of the node and the superscript $j$ denoting the rank of the node within the same depth, ordered by AIC. Then, $M(=2)$ nodes with the best AIC among the $N$ generated nodes are selected as parents of the next depth nodes (nodes filled in gray). This operation is repeated until the maximum depth $D$ is reached. Furthermore, to handle exceptions, we check whether the AIC has improved compared to the best AIC at the previous depth and record the number of consecutive times the AIC has not improved for each node (note that it is reset to $0$ at $V_3^1$). The AFE algorithm prevents the node with this counter value greater than or equal to $\gamma(=2)$ from becoming parent node. In this figure, at depth $3$, all nodes except $V_3^1$ cannot become parents, then the next feature generation is performed only from $V_3^1$.
  • Figure 4: Schematic illustration of the proposed line search method to identify the truncation intervals $\mathcal{Z}$. We first compute the interval within which the generated features by the AFE algorithm remain unchanged. Then, we identify the truncation intervals $\mathcal{Z}$ by taking the union of some intervals based on parametric-programming.
  • Figure 5: Type I Error Rate and Power for $\Sigma=I_n$
  • ...and 6 more figures

Theorems & Definitions (2)

  • Theorem 1
  • Theorem 2