Table of Contents
Fetching ...

AFABench: A Generic Framework for Benchmarking Active Feature Acquisition

Valter Schütz, Han Wu, Reza Rezvan, Linus Aronsson, Morteza Haghir Chehreghani

TL;DR

AFABench provides the first standardized benchmark framework for Active Feature Acquisition, enabling fair, reproducible comparisons across a wide range of methods and datasets. It formalizes AFA as a budgeted decision process, implements a modular evaluation pipeline, and includes representative myopic, non-myopic RL, non-RL, and static methods, plus a novel Cube-NM dataset to stress lookahead capabilities. The experimental results show that non-myopic strategies can outperform myopic approaches in some settings, but many real-world tasks are well served by strong myopic baselines and static methods, with RL methods often incurring higher compute and exhibiting instability. Overall, AFABench delineates when lookahead pays off, provides actionable guidance for method choice under different budgets and data regimes, and offers a scalable platform for advancing cost-aware learning and feature acquisition research.

Abstract

In many real-world scenarios, acquiring all features of a data instance can be expensive or impractical due to monetary cost, latency, or privacy concerns. Active Feature Acquisition (AFA) addresses this challenge by dynamically selecting a subset of informative features for each data instance, trading predictive performance against acquisition cost. While numerous methods have been proposed for AFA, ranging from myopic information-theoretic strategies to non-myopic reinforcement learning approaches, fair and systematic evaluation of these methods has been hindered by a lack of standardized benchmarks. In this paper, we introduce AFABench, the first benchmark framework for AFA. Our benchmark includes a diverse set of synthetic and real-world datasets, supports a wide range of acquisition policies, and provides a modular design that enables easy integration of new methods and tasks. We implement and evaluate representative algorithms from all major categories, including static, myopic, and reinforcement learning-based approaches. To test the lookahead capabilities of AFA policies, we introduce a novel synthetic dataset, CUBE-NM, designed to expose the limitations of myopic selection. Our results highlight key trade-offs between different AFA strategies and provide actionable insights for future research. The benchmark code is available at: https://github.com/Linusaronsson/AFA-Benchmark.

AFABench: A Generic Framework for Benchmarking Active Feature Acquisition

TL;DR

AFABench provides the first standardized benchmark framework for Active Feature Acquisition, enabling fair, reproducible comparisons across a wide range of methods and datasets. It formalizes AFA as a budgeted decision process, implements a modular evaluation pipeline, and includes representative myopic, non-myopic RL, non-RL, and static methods, plus a novel Cube-NM dataset to stress lookahead capabilities. The experimental results show that non-myopic strategies can outperform myopic approaches in some settings, but many real-world tasks are well served by strong myopic baselines and static methods, with RL methods often incurring higher compute and exhibiting instability. Overall, AFABench delineates when lookahead pays off, provides actionable guidance for method choice under different budgets and data regimes, and offers a scalable platform for advancing cost-aware learning and feature acquisition research.

Abstract

In many real-world scenarios, acquiring all features of a data instance can be expensive or impractical due to monetary cost, latency, or privacy concerns. Active Feature Acquisition (AFA) addresses this challenge by dynamically selecting a subset of informative features for each data instance, trading predictive performance against acquisition cost. While numerous methods have been proposed for AFA, ranging from myopic information-theoretic strategies to non-myopic reinforcement learning approaches, fair and systematic evaluation of these methods has been hindered by a lack of standardized benchmarks. In this paper, we introduce AFABench, the first benchmark framework for AFA. Our benchmark includes a diverse set of synthetic and real-world datasets, supports a wide range of acquisition policies, and provides a modular design that enables easy integration of new methods and tasks. We implement and evaluate representative algorithms from all major categories, including static, myopic, and reinforcement learning-based approaches. To test the lookahead capabilities of AFA policies, we introduce a novel synthetic dataset, CUBE-NM, designed to expose the limitations of myopic selection. Our results highlight key trade-offs between different AFA strategies and provide actionable insights for future research. The benchmark code is available at: https://github.com/Linusaronsson/AFA-Benchmark.

Paper Structure

This paper contains 63 sections, 2 theorems, 10 equations, 27 figures, 2 tables.

Key Result

Theorem 4.1

Consider the CUBE-NM dataset with $n_c$ contexts in the noiseless regime $\sigma=0$ (such that $100\%$ prediction accuracy is possible). Assume uniform unit acquisition costs, i.e., $c_i=1$ for all $i\in[d]$. Then the myopic CMI policy in eq:cmi requires, in expectation across instances, $13(2n_c+1)

Figures (27)

  • Figure 1: An AFA episode where the instance $x \in \mathcal{X}$ is an image.
  • Figure 2: An example of how an Unmasker could calculate the cost of an action, given feature costs from a dataset.
  • Figure 3: The pipeline consists of three stages; optional stages are indicated by dotted lines.
  • Figure 4: Visualization of (a) CUBE, and (b) our proposed synthetic dataset Cube-NM.
  • Figure 5: Results for the soft-budget setting using a shared external classifier across all methods. Not all methods are applicable in the soft-budget setting (see main text).
  • ...and 22 more figures

Theorems & Definitions (2)

  • Theorem 4.1: Informal
  • Theorem F.1: Noiseless CUBE-NM query complexity