A Survey on Active Feature Acquisition Strategies

Arman Rahbar; Linus Aronsson; Morteza Haghir Chehreghani

A Survey on Active Feature Acquisition Strategies

Arman Rahbar, Linus Aronsson, Morteza Haghir Chehreghani

TL;DR

This survey addresses how to make accurate predictions while minimizing the cost of feature acquisition by reviewing three main families of approaches: greedy methods based on information-theoretic criteria (notably conditional mutual information and EC$^2$), embedded methods that integrate feature querying into model training and inference, and MDP/RL-based methods that optimize long-horizon feature acquisition policies. It explains core formulations, such as the per-instance objective that balances loss and acquisition cost, and compares generative and discriminative strategies for estimating information gain, as well as model-based and model-free RL, offline/online settings, and search/imitation techniques. The paper highlights open challenges including the need for robust online benchmarks, theoretical guarantees, and improved conditional-distribution modeling (e.g., via partial VAEs and surrogate models) to enable scalable, explainable AFA systems. Overall, the work synthesizes a landscape of methods, clarifies trade-offs, and outlines directions that can advance cost-aware predictive systems in domains where data collection is expensive or invasive.

Abstract

Active feature acquisition studies the challenge of making accurate predictions while limiting the cost of collecting complete data. By selectively acquiring only the most informative features for each instance, these strategies enable efficient decision-making in scenarios where data collection is expensive or time-consuming. This survey reviews recent progress in active feature acquisition, discussing common problem formulations, practical challenges, and key insights. We also highlight open issues and promising directions for future research.

A Survey on Active Feature Acquisition Strategies

TL;DR

), embedded methods that integrate feature querying into model training and inference, and MDP/RL-based methods that optimize long-horizon feature acquisition policies. It explains core formulations, such as the per-instance objective that balances loss and acquisition cost, and compares generative and discriminative strategies for estimating information gain, as well as model-based and model-free RL, offline/online settings, and search/imitation techniques. The paper highlights open challenges including the need for robust online benchmarks, theoretical guarantees, and improved conditional-distribution modeling (e.g., via partial VAEs and surrogate models) to enable scalable, explainable AFA systems. Overall, the work synthesizes a landscape of methods, clarifies trade-offs, and outlines directions that can advance cost-aware predictive systems in domains where data collection is expensive or invasive.

A Survey on Active Feature Acquisition Strategies

TL;DR

Abstract

A Survey on Active Feature Acquisition Strategies

TL;DR

Abstract

Paper Structure

Table of Contents