Evaluation of Active Feature Acquisition Methods for Time-varying Feature Settings

Henrik von Kleist; Alireza Zamanian; Ilya Shpitser; Narges Ahmidi

Evaluation of Active Feature Acquisition Methods for Time-varying Feature Settings

Henrik von Kleist, Alireza Zamanian, Ilya Shpitser, Narges Ahmidi

TL;DR

This paper tackles AFAPE, the problem of evaluating active feature acquisition systems under deployment distribution shifts. It develops three complementary viewpoints—offline reinforcement learning (NUC), missing data with online RL (NDE), and a novel semi-offline RL framework that combines online interaction with constrained offline exploration. For each view, it derives identification and estimation strategies, including DM, IPW, and DRL estimators, with the semi-offline DRL being doubly robust and more data-efficient. Semiparametric theory unifies the approaches and provides influence-function-based insights, while synthetic experiments show substantial gains in efficiency and reliable evaluation under realism-driven assumption violations. The work offers practical guidance for safely deploying AFA agents by enabling unbiased estimation of misclassification and acquisition costs across deployment scenarios.

Abstract

Machine learning methods often assume that input features are available at no cost. However, in domains like healthcare, where acquiring features could be expensive or harmful, it is necessary to balance a feature's acquisition cost against its predictive value. The task of training an AI agent to decide which features to acquire is called active feature acquisition (AFA). By deploying an AFA agent, we effectively alter the acquisition strategy and trigger a distribution shift. To safely deploy AFA agents under this distribution shift, we present the problem of active feature acquisition performance evaluation (AFAPE). We examine AFAPE under i) a no direct effect (NDE) assumption, stating that acquisitions do not affect the underlying feature values; and ii) a no unobserved confounding (NUC) assumption, stating that retrospective feature acquisition decisions were only based on observed features. We show that one can apply missing data methods under the NDE assumption and offline reinforcement learning under the NUC assumption. When NUC and NDE hold, we propose a novel semi-offline reinforcement learning framework. This framework requires a weaker positivity assumption and introduces three new estimators: A direct method (DM), an inverse probability weighting (IPW), and a double reinforcement learning (DRL) estimator.

Evaluation of Active Feature Acquisition Methods for Time-varying Feature Settings

TL;DR

Abstract

Paper Structure (68 sections, 18 theorems, 144 equations, 11 figures, 2 tables)

This paper contains 68 sections, 18 theorems, 144 equations, 11 figures, 2 tables.

Introduction
Heart Attack Diagnosis Example
Paper Goal
Paper Outline and Contributions
Background and Related Methods
Active Feature Acquisition (AFA)
(Offline) Reinforcement Learning (RL) / Dynamic Treatment Regimes (DTR)
Missing Data
Semiparametric Theory
Active Feature Acquisition Performance Evaluation (AFAPE)
No Direct Effect (NDE) Assumption
Distribution Shift Robust ML Models
Active Feature Acquisition Performance Evaluation (AFAPE) Problem Definition
Feature Acquisition Process
Classification Process
...and 53 more sections

Key Result

Theorem 1

(AFAPE problem reformulation and identification under the missing data view). The AFAPE problem of estimating $J$ (Equation eq:AFAPE_objective) is under Assumption assump:measurement_noise (no measurement noise), Assumption assump:consistency (consistency), Assumption assump:interference (no interfe Furthermore, $J$ is identified if $p(X_{(1)}, Y)$ is identified.

Figures (11)

Figure 1: AFA process for a simplified hypothetical heart attack diagnosis example. A patient with chest pain ($X^0$) prompts the doctor to first order a troponin lab test ($A^1$) and, upon reviewing the result ($X^1$), to also order a coronography (CAG) ($A^2$). The feature acquisitions $A^1$ and $A^2$ produce feature acquisition costs $C_a^1$ and $C_a^2$. After the acquisition process concludes, the doctor makes a diagnosis $Y^*$, which, if different from the true underlying condition $Y$, produces a misclassification cost $C_{mc}$.
Figure 2: The causal graph depicting the AFA setting as a partially observable decision process consisting of unobserved underlying features $U^t$, feature acquisition actions $A^t$, feature measurements $X^t = G_{A^t}(U^t)$, and associated acquisition costs $C_a^t$. After a number of acquisition steps $T$ (here $T=2$), a classification $Y^*$ is to be performed. In the case of misclassification ($Y^*$ is not equal to the true label $Y$), a misclassification cost $C_{mc}$ is produced. Edges showing long-term dependencies are omitted from the graph for visual clarity. These include: $\underline{U}^{t-1}, \underline{X}^{t-1}, \underline{A}^{t-1} \rightarrow A^{t}$; $\underline{X}^{T}, \underline{A}^{T} \rightarrow Y^*$; $A^{t} \rightarrow \overline{U}^t$; $\underline{U}^{t-1} \leftrightarrow U^{t}$; $\underline{U}^{t-1} \rightarrow U^{t}$; $\underline{U}^{T} \leftrightarrow Y$ and $\underline{U}^{T} \rightarrow Y$ (where $\leftrightarrow$ denotes unobserved confounding).
Figure 3: Updated causal graph of the AFA setting under the NUC assumption (Assumption \ref{['assump:nuc']}) and a latent projection. The graph depicts a standard, identified offline RL setting. Long-term dependencies are omitted from the graph for visual clarity. These include edges $\underline{X}^{t-1}, \underline{A}^{t-1} \rightarrow A^t$; $\underline{X}^{T}, \underline{A}^{T} \rightarrow C$; $\underline{X}^{t-1} \leftrightarrow X^{t}$ and $\underline{X}^{T} \leftrightarrow C$.
Figure 4: A) Updated causal graph of the AFA process under the NDE assumption (Assumption \ref{['assump:nde']}). Unknown state variables $U^t$ are replaced with the counterfactual feature values $X_{(1)}^t$, which represent the values $X^t$ would have taken if $A^t$ was $\vec{1}$ (i.e., the decision to observe all feature values). This graph describing the feature acquisition process is known as a missing data graph (m-graph). B) Graph showing the counterfactual distribution under $\pi_\alpha$. Edges showing long-term dependencies are omitted for visual clarity. These include for both graphs $\underline{X}_{(1)}^{t-1} \leftrightarrow X_{(1)}^{t}$ and $\underline{X}_{(1)}^{T} \leftrightarrow Y$; for A) $\underline{X}^{t-1}, \underline{X}_{(1)}^{t-1}, \underline{A}^{t-1} \rightarrow A^{t}$, and $\underline{X}^{T}, \underline{A}^{T} \rightarrow Y^*$; and for B) $X^0,\underline{X}_{(\pi_{\alpha})}^{t-1}, \underline{A}_{(\pi_{\alpha})}^{t-1} \rightarrow A_{(\pi_{\alpha})}^{t}$ and $X^0,\underline{X}_{(\pi_{\alpha})}^{T}, \underline{A}_{(\pi_{\alpha})}^{T} \rightarrow Y_{(\pi_{\alpha})}^*$.
Figure 5: Visualization of data utilization by IPW estimators under different views. Each graph shows the four possible target acquisition trajectories for two exemplary retrospective acquisition scenarios and highlights which target trajectories can receive non-zero IPW weights under the respective views. A), D) The IPW estimator from the offline RL viewpoint: only target trajectories that match the retrospective trajectory can be evaluated. B), E) The IPW estimator from the missing data viewpoint: all trajectories can be evaluated if the datapoint is a complete case; otherwise, no evaluation is possible. C), F) The IPW estimator from the semi-offline RL viewpoint: all trajectories with equal or fewer acquisitions than the retrospective trajectory can be evaluated.
...and 6 more figures

Theorems & Definitions (29)

Remark 1: Cross-fitting estimators
Theorem 1
Definition 1
Theorem 2
Remark 2: Comparison of AFAPE under offline RL and semi-offline RL
Definition 2
Definition 3
Lemma 1
Theorem 3
Theorem 4
...and 19 more

Evaluation of Active Feature Acquisition Methods for Time-varying Feature Settings

TL;DR

Abstract

Evaluation of Active Feature Acquisition Methods for Time-varying Feature Settings

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (11)

Theorems & Definitions (29)