Table of Contents
Fetching ...

Cohort-Based Active Modality Acquisition

Tillmann Rheude, Roland Eils, Benjamin Wild

TL;DR

This work introduces Cohort-based Active Modality Acquisition (CAMA), a test-time, cohort-level framework for allocating a scarce costly modality to a subset of samples to maximize a global predictive metric. It develops acquisition functions that combine generative imputation with discriminative scoring to estimate the counterfactual benefit of acquiring a modality, along with upper-bound benchmarks. Across four multimodal datasets, including a large-scale UK Biobank cohort, imputation-based strategies—particularly a kldiv-based approach—consistently outperform unimodal baselines, entropy-driven methods, and random selection, demonstrating robust gains under budget constraints. The study also provides architectural and training insights, showing that coherence between gen- and discrim- components and robust handling of imputed latents are essential for effective cohort-level modality allocation with real-world applicability in healthcare and beyond.

Abstract

Real-world machine learning applications often involve data from multiple modalities that must be integrated effectively to make robust predictions. However, in many practical settings, not all modalities are available for every sample, and acquiring additional modalities can be costly. This raises the question: which samples should be prioritized for additional modality acquisition when resources are limited? While prior work has explored individual-level acquisition strategies and training-time active learning paradigms, test-time and cohort-based acquisition remain underexplored. We introduce Cohort-based Active Modality Acquisition (CAMA), a novel test-time setting to formalize the challenge of selecting which samples should receive additional modalities. We derive acquisition strategies that leverage a combination of generative imputation and discriminative modeling to estimate the expected benefit of acquiring missing modalities based on common evaluation metrics. We also introduce upper-bound heuristics that provide performance ceilings to benchmark acquisition strategies. Experiments on multimodal datasets with up to 15 modalities demonstrate that our proposed imputation-based strategies can more effectively guide the acquisition of additional modalities for selected samples compared with methods relying solely on unimodal information, entropy-based guidance, or random selection. We showcase the real-world relevance and scalability of our method by demonstrating its ability to effectively guide the costly acquisition of proteomics data for disease prediction in a large prospective cohort, the UK Biobank (UKBB). Our work provides an effective approach for optimizing modality acquisition at the cohort level, enabling more effective use of resources in constrained settings.

Cohort-Based Active Modality Acquisition

TL;DR

This work introduces Cohort-based Active Modality Acquisition (CAMA), a test-time, cohort-level framework for allocating a scarce costly modality to a subset of samples to maximize a global predictive metric. It develops acquisition functions that combine generative imputation with discriminative scoring to estimate the counterfactual benefit of acquiring a modality, along with upper-bound benchmarks. Across four multimodal datasets, including a large-scale UK Biobank cohort, imputation-based strategies—particularly a kldiv-based approach—consistently outperform unimodal baselines, entropy-driven methods, and random selection, demonstrating robust gains under budget constraints. The study also provides architectural and training insights, showing that coherence between gen- and discrim- components and robust handling of imputed latents are essential for effective cohort-level modality allocation with real-world applicability in healthcare and beyond.

Abstract

Real-world machine learning applications often involve data from multiple modalities that must be integrated effectively to make robust predictions. However, in many practical settings, not all modalities are available for every sample, and acquiring additional modalities can be costly. This raises the question: which samples should be prioritized for additional modality acquisition when resources are limited? While prior work has explored individual-level acquisition strategies and training-time active learning paradigms, test-time and cohort-based acquisition remain underexplored. We introduce Cohort-based Active Modality Acquisition (CAMA), a novel test-time setting to formalize the challenge of selecting which samples should receive additional modalities. We derive acquisition strategies that leverage a combination of generative imputation and discriminative modeling to estimate the expected benefit of acquiring missing modalities based on common evaluation metrics. We also introduce upper-bound heuristics that provide performance ceilings to benchmark acquisition strategies. Experiments on multimodal datasets with up to 15 modalities demonstrate that our proposed imputation-based strategies can more effectively guide the acquisition of additional modalities for selected samples compared with methods relying solely on unimodal information, entropy-based guidance, or random selection. We showcase the real-world relevance and scalability of our method by demonstrating its ability to effectively guide the costly acquisition of proteomics data for disease prediction in a large prospective cohort, the UK Biobank (UKBB). Our work provides an effective approach for optimizing modality acquisition at the cohort level, enabling more effective use of resources in constrained settings.

Paper Structure

This paper contains 51 sections, 24 equations, 4 figures, 44 tables.

Figures (4)

  • Figure 1: Motivational example for cama determining the added value of obtaining the mri modality. (A) A heterogeneous cohort for which each sample has $P$ distinct modalities. (B) Instead of using the initial subset logit scores ${\textnormal{s}}_i^{\textnormal{avail}}$, a generative model $f_{\textnormal{imp}}$ imputes the target missing modality for every patient in the cohort. This yields imputed, augmented-modality logit scores $\{{\textnormal{s}}_{i,k}^{\textnormal{imp}}\}_{k=1}^K$ that approximate the logits as if that modality were available. These scores approximate ${\textnormal{s}}_i^{\textnormal{acquired}}$, i.e., the counterfactual with only the imputed modality added. (C) An af utilizes these scores to rank samples by acquisition priority. The graph demonstrates how the global performance metric improves from the initial baseline towards the performance of a model with access to post-acquisition data, as an increasing fraction of the cohort receives the additional modality. This acquisition process is guided by the proposed strategies operating under the acquisition budget constraint $\beta$.
  • Figure 2: End-to-end architectures to determine the scores for different af in our proposed cama setting. (A) Vanilla lf architecture of a model $f$ that can handle missing data modalities by masking. The model creates scores $s_i^{\text{avail}}$ given the available modalities. (B) Architecture for training (left) and inference (right) with a lf model $f$ and a generative model $f_{\text{imp}}$ to create scores $s_i^{\text{imp}}$ for the imputation-based af.
  • Figure 3: (a)auroc curves for several af on the MOSEI dataset zadeh_multimodal_2018 at an acquisition budget of $25\%$ of the dataset size. (b) Acquisition performance of the best-performing af from (a), visualizing the gain achieved during the progressive acquisition of modalities as the cohort transitions from pre-acquisition scores towards post-acquisition. Notably, the oracle af can exceed the post-acquisition cohort's auroc at certain fractions of acquired modalities before subsequently declining towards it again.
  • Figure 4: The latent ddpm with its (de)noising functions. Coloring represents less noise in the latent space, starting with pure noise in $X_{i,T}=X_{1,T}$ with $T$ steps. The ddpm is conditioned with two non-missing latent spaces, each from one remaining modality respectively.