Table of Contents
Fetching ...

Dynamic Feature Selection from Variable Feature Sets Using Features of Features

Katsumi Takahashi, Koh Takeuchi, Hisashi Kashima

TL;DR

This work tackles dynamic feature selection when the set of measurable features varies across instances by introducing features of features as prior information. It extends existing CMI‑based DFS to a variable feature‑set setting, training a policy and a predictor that incorporate both measured feature values and per‑feature priors, using amortized optimization and permutation‑invariant architectures. The proposed method demonstrates superior accuracy over random selection and fixed‑set baselines on image (MNIST, FashionMNIST, CIFAR‑10) and document (BBCSport, 20NEWS) classification tasks, especially at small budgets, and reveals interpretable feature‑selection patterns. The approach offers a principled, cost‑aware framework for instance‑dependent feature acquisition with broad applicability in domains where feature costs and availability vary.

Abstract

Machine learning models usually assume that a set of feature values used to obtain an output is fixed in advance. However, in many real-world problems, a cost is associated with measuring these features. To address the issue of reducing measurement costs, various methods have been proposed to dynamically select which features to measure, but existing methods assume that the set of measurable features remains constant, which makes them unsuitable for cases where the set of measurable features varies from instance to instance. To overcome this limitation, we define a new problem setting for Dynamic Feature Selection (DFS) with variable feature sets and propose a deep learning method that utilizes prior information about each feature, referred to as ''features of features''. Experimental results on several datasets demonstrate that the proposed method effectively selects features based on the prior information, even when the set of measurable features changes from instance to instance.

Dynamic Feature Selection from Variable Feature Sets Using Features of Features

TL;DR

This work tackles dynamic feature selection when the set of measurable features varies across instances by introducing features of features as prior information. It extends existing CMI‑based DFS to a variable feature‑set setting, training a policy and a predictor that incorporate both measured feature values and per‑feature priors, using amortized optimization and permutation‑invariant architectures. The proposed method demonstrates superior accuracy over random selection and fixed‑set baselines on image (MNIST, FashionMNIST, CIFAR‑10) and document (BBCSport, 20NEWS) classification tasks, especially at small budgets, and reveals interpretable feature‑selection patterns. The approach offers a principled, cost‑aware framework for instance‑dependent feature acquisition with broad applicability in domains where feature costs and availability vary.

Abstract

Machine learning models usually assume that a set of feature values used to obtain an output is fixed in advance. However, in many real-world problems, a cost is associated with measuring these features. To address the issue of reducing measurement costs, various methods have been proposed to dynamically select which features to measure, but existing methods assume that the set of measurable features remains constant, which makes them unsuitable for cases where the set of measurable features varies from instance to instance. To overcome this limitation, we define a new problem setting for Dynamic Feature Selection (DFS) with variable feature sets and propose a deep learning method that utilizes prior information about each feature, referred to as ''features of features''. Experimental results on several datasets demonstrate that the proposed method effectively selects features based on the prior information, even when the set of measurable features changes from instance to instance.

Paper Structure

This paper contains 18 sections, 2 theorems, 6 equations, 6 figures, 1 algorithm.

Key Result

theorem thmcountertheorem

When $l$ is cross-entropy loss, the global optimum of exist_loss_rewrite satisfy that the predictor be the Bayes classifier $f_{\theta^*}(x_s)=p(\mathbf{y}|x_s)$ and the predictor put all probability mass on $i^*= \operatorname{argmax}_{i} I(\mathbf{y};\mathbf{x}_i|x_s)$.

Figures (6)

  • Figure 1: Graphical model of our problem setting and selection/prediction process: $y$ is the response variable, $x_{\lambda_i}$ is the feature value and $z_{\lambda_i}$ is the features of features. The index set $\Lambda^{(n)}=\{\lambda_1,...,\lambda_{d^{(n)}}\}$ can be different from each instance. The dependencies between $x_{\lambda_i}$ and $y$ follow the assumption of the naive Bayesian model. The value of $x_{\lambda_i}$ depends on $y$ and $z_{\lambda_i}$. Variables with a gray background are unknown and those with a white background are known.
  • Figure 2: Our model and data flow. First, the policy selects features iteratively based on revealed features and features of features, then the predictor makes the final prediction from revealed information.
  • Figure 3: Accuracy for image classification. The horizontal axis is the number of selected features, and the vertical axis is the accuracy of classification. The dotted line in each figure is the chance rate.
  • Figure 4: Frequency of feature selection for each image dataset. More frequently selected positions are lighter in color.
  • Figure 5: The trade-off of accuracy and feature budget for document classification. The dotted line in each figure is the chance rate.
  • ...and 1 more figures

Theorems & Definitions (2)

  • theorem thmcountertheorem
  • theorem thmcountertheorem