AFABench: A Generic Framework for Benchmarking Active Feature Acquisition
Valter Schütz, Han Wu, Reza Rezvan, Linus Aronsson, Morteza Haghir Chehreghani
TL;DR
AFABench provides the first standardized benchmark framework for Active Feature Acquisition, enabling fair, reproducible comparisons across a wide range of methods and datasets. It formalizes AFA as a budgeted decision process, implements a modular evaluation pipeline, and includes representative myopic, non-myopic RL, non-RL, and static methods, plus a novel Cube-NM dataset to stress lookahead capabilities. The experimental results show that non-myopic strategies can outperform myopic approaches in some settings, but many real-world tasks are well served by strong myopic baselines and static methods, with RL methods often incurring higher compute and exhibiting instability. Overall, AFABench delineates when lookahead pays off, provides actionable guidance for method choice under different budgets and data regimes, and offers a scalable platform for advancing cost-aware learning and feature acquisition research.
Abstract
In many real-world scenarios, acquiring all features of a data instance can be expensive or impractical due to monetary cost, latency, or privacy concerns. Active Feature Acquisition (AFA) addresses this challenge by dynamically selecting a subset of informative features for each data instance, trading predictive performance against acquisition cost. While numerous methods have been proposed for AFA, ranging from myopic information-theoretic strategies to non-myopic reinforcement learning approaches, fair and systematic evaluation of these methods has been hindered by a lack of standardized benchmarks. In this paper, we introduce AFABench, the first benchmark framework for AFA. Our benchmark includes a diverse set of synthetic and real-world datasets, supports a wide range of acquisition policies, and provides a modular design that enables easy integration of new methods and tasks. We implement and evaluate representative algorithms from all major categories, including static, myopic, and reinforcement learning-based approaches. To test the lookahead capabilities of AFA policies, we introduce a novel synthetic dataset, CUBE-NM, designed to expose the limitations of myopic selection. Our results highlight key trade-offs between different AFA strategies and provide actionable insights for future research. The benchmark code is available at: https://github.com/Linusaronsson/AFA-Benchmark.
