Table of Contents
Fetching ...

NOCTA: Non-Greedy Objective Cost-Tradeoff Acquisition for Longitudinal Data

Dzung Dinh, Boqi Chen, Yunni Qu, Marc Niethammer, Junier Oliva

TL;DR

NOCTA tackles the challenge of inference-time feature acquisition in longitudinal data by introducing a non-greedy planning objective, NOCT, that jointly considers predictive loss and feature acquisition costs. It provides two estimators for approximating NOCT at inference: NOCT-Contrastive, an embedding-based retrieval method, and NOCT-Amortized, a neural predictor for NOCT values. Across synthetic and real medical datasets, NOCTA consistently outperforms RL-based and greedy baselines, achieving higher accuracy at lower data-acquisition costs and exhibiting early, adaptive acquisition behavior. This framework enables efficient, cost-aware decision support in settings where measurements are expensive or risky, with broad potential applications beyond healthcare.

Abstract

In many critical domains, features are not freely available at inference time: each measurement may come with a cost of time, money, and risk. Longitudinal prediction further complicates this setting because both features and labels evolve over time, and missing measurements at earlier timepoints may become permanently unavailable. We propose NOCTA, a Non-Greedy Objective Cost-Tradeoff Acquisition framework that sequentially acquires the most informative features at inference time while accounting for both temporal dynamics and acquisition cost. NOCTA is driven by a novel objective, NOCT, which evaluates a candidate set of future feature-time acquisitions by its expected predictive loss together with its acquisition cost. Since NOCT depends on unobserved future trajectories at inference time, we develop two complementary estimators: (i) NOCT-Contrastive, which learns an embedding of partial observations utilizing the induced distribution over future acquisitions, and (ii) NOCT-Amortized, which directly predicts NOCT for candidate plans with a neural network. Experiments on synthetic and real-world medical datasets demonstrate that both NOCTA estimators outperform existing baselines, achieving higher accuracy at lower acquisition costs.

NOCTA: Non-Greedy Objective Cost-Tradeoff Acquisition for Longitudinal Data

TL;DR

NOCTA tackles the challenge of inference-time feature acquisition in longitudinal data by introducing a non-greedy planning objective, NOCT, that jointly considers predictive loss and feature acquisition costs. It provides two estimators for approximating NOCT at inference: NOCT-Contrastive, an embedding-based retrieval method, and NOCT-Amortized, a neural predictor for NOCT values. Across synthetic and real medical datasets, NOCTA consistently outperforms RL-based and greedy baselines, achieving higher accuracy at lower data-acquisition costs and exhibiting early, adaptive acquisition behavior. This framework enables efficient, cost-aware decision support in settings where measurements are expensive or risky, with broad potential applications beyond healthcare.

Abstract

In many critical domains, features are not freely available at inference time: each measurement may come with a cost of time, money, and risk. Longitudinal prediction further complicates this setting because both features and labels evolve over time, and missing measurements at earlier timepoints may become permanently unavailable. We propose NOCTA, a Non-Greedy Objective Cost-Tradeoff Acquisition framework that sequentially acquires the most informative features at inference time while accounting for both temporal dynamics and acquisition cost. NOCTA is driven by a novel objective, NOCT, which evaluates a candidate set of future feature-time acquisitions by its expected predictive loss together with its acquisition cost. Since NOCT depends on unobserved future trajectories at inference time, we develop two complementary estimators: (i) NOCT-Contrastive, which learns an embedding of partial observations utilizing the induced distribution over future acquisitions, and (ii) NOCT-Amortized, which directly predicts NOCT for candidate plans with a neural network. Experiments on synthetic and real-world medical datasets demonstrate that both NOCTA estimators outperform existing baselines, achieving higher accuracy at lower acquisition costs.

Paper Structure

This paper contains 34 sections, 2 theorems, 32 equations, 5 figures, 11 tables, 1 algorithm.

Key Result

Theorem 3.1

(Informal) $-\mathrm{NOCT}$ lower bounds the value of the optimal longitudinal AFA MDP policy

Figures (5)

  • Figure 1: Illustration of clinical longitudinal AFA. At the current visit, an autonomous agent reviews previously collected and newly acquired data, predicts the patient's current status, and recommends a subset of examinations for a determined future timepoint. This process repeats until the agent recommends no further acquisition.
  • Figure 2: Overview of NOCTA. (a) NOCT evaluation of plan. Given partially observed state $(x_o, o, t)$, a candidate plan $v \subseteq \mathcal{V}_{>t}$ (a feature$\times$time matrix) is evaluated by the tradeoff of (i) the expected accumulated predictive cross-entropy (CE) loss for future acquisitions from $t+1$ to $L$ and (ii) the acquisition cost of selected future acquisitions (i.e., $\alpha \sum_{(m, t') \in v} c^m$). Under plan $v$, the prediction at future time $k$ only uses acquisitions up to time $k$ (i.e., $\hat{y}_{k}(x_{o\cup v_{\le k}})$). (b) Acquisition algorithm. At state $(x_o, o, t)$, the NOCTA algorithm evaluates candidate plans $v \subseteq \mathcal{V}_{>t}$ with a NOCT estimator. (1) We select the best plan, $\hat{u}$, according to the NOCT estimates. (2) We execute the selected plan up to its next acquisition (i.e., $t_{\text{next}}\equiv\min{t':(m,t')\in \hat{u}}$) by acquiring $a_{t_{\text{next}}} \equiv \{(m,t_{\text{next}})\in \hat{u}\}$. Predictions are made at every timepoint; e.g., as illustrated, with $t=2$ and $t_{\text{next}}=5$, NOCTA predicts at $t=3, 4$ using current information (i.e., $\hat{y}_{3}(x_o), \hat{y}_{4}(x_o))$ and predicts at $t_{\text{next}}=5$ using newly observed features (i.e., $\hat{y}_{5}(x_{o\cup a_{t_{\text{next}}}})$). (3) We reassess with $x_o \equiv x_{o\cup a_{t_{\text{next}}}}$ and $t \equiv t_\text{next}$, repeating until the selected plan is empty ($\hat{u}=\varnothing$) or when we reach the final time step ($t= L$).
  • Figure 3: Performance/cost of models across various average acquisition costs (budgets). Following kossen2022active, we show accuracy rather than average precision for the synthetic dataset. Our NOCT estimators show the highest accuracy for a given cost.
  • Figure 4: Qualitative comparison of acquisition dynamics on the test set of the ADNI dataset (regular follow-ups every six months). For each method, top shows the average acquisition cost at each timepoint (mean per sample), and middle shows the termination distribution that occurred at each timepoint. The bottom panel visualizes feature-time acquisition as a directed graph: each node corresponds to a feature acquired at a specific timepoint, and node size shows how often that feature is acquired at that time. A directed arrow from $(m,t)$ to $(m',t')$ (with $t'>t$) indicates the frequent co-acquisition across time, i.e., samples that acquire $(m,t)$ also acquire $(m',t')$ later. Average acquisition cost per sample: $4.3775$ (NOCT-Contrastive), $3.7425$ (NOCT-Amortized), $4.0475$ (DIME), and $3.9750$ (RAS).
  • Figure 5: Additional results on performance/cost of models, measured by ROC AUC across varying average acquisition budgets on real‑world datasets. Best viewed in color.

Theorems & Definitions (2)

  • Theorem 3.1
  • Proposition 3.2