Cohort-Based Active Modality Acquisition

Tillmann Rheude; Roland Eils; Benjamin Wild

Cohort-Based Active Modality Acquisition

Tillmann Rheude, Roland Eils, Benjamin Wild

TL;DR

This work introduces Cohort-based Active Modality Acquisition (CAMA), a test-time, cohort-level framework for allocating a scarce costly modality to a subset of samples to maximize a global predictive metric. It develops acquisition functions that combine generative imputation with discriminative scoring to estimate the counterfactual benefit of acquiring a modality, along with upper-bound benchmarks. Across four multimodal datasets, including a large-scale UK Biobank cohort, imputation-based strategies—particularly a kldiv-based approach—consistently outperform unimodal baselines, entropy-driven methods, and random selection, demonstrating robust gains under budget constraints. The study also provides architectural and training insights, showing that coherence between gen- and discrim- components and robust handling of imputed latents are essential for effective cohort-level modality allocation with real-world applicability in healthcare and beyond.

Abstract

Real-world machine learning applications often involve data from multiple modalities that must be integrated effectively to make robust predictions. However, in many practical settings, not all modalities are available for every sample, and acquiring additional modalities can be costly. This raises the question: which samples should be prioritized for additional modality acquisition when resources are limited? While prior work has explored individual-level acquisition strategies and training-time active learning paradigms, test-time and cohort-based acquisition remain underexplored. We introduce Cohort-based Active Modality Acquisition (CAMA), a novel test-time setting to formalize the challenge of selecting which samples should receive additional modalities. We derive acquisition strategies that leverage a combination of generative imputation and discriminative modeling to estimate the expected benefit of acquiring missing modalities based on common evaluation metrics. We also introduce upper-bound heuristics that provide performance ceilings to benchmark acquisition strategies. Experiments on multimodal datasets with up to 15 modalities demonstrate that our proposed imputation-based strategies can more effectively guide the acquisition of additional modalities for selected samples compared with methods relying solely on unimodal information, entropy-based guidance, or random selection. We showcase the real-world relevance and scalability of our method by demonstrating its ability to effectively guide the costly acquisition of proteomics data for disease prediction in a large prospective cohort, the UK Biobank (UKBB). Our work provides an effective approach for optimizing modality acquisition at the cohort level, enabling more effective use of resources in constrained settings.

Cohort-Based Active Modality Acquisition

TL;DR

Abstract

Cohort-Based Active Modality Acquisition

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (4)