EVA: Towards a universal model of the immune system
Ethan Bandasack, Vincent Bouget, Apolline Bruley, Yannis Cattan, Charlotte Claye, Matthew Corney, Julien Duquesne, Karim El Kanbi, Aziz Fouché, Pierre Marschall, Francesco Strozzi
TL;DR
EVA introduces a novel cross-species, multimodal foundation model for immunology and inflammation that unifies transcriptomics across human and mouse with histology data to produce patient-level representations. By jointly training a transcriptomics encoder (EVA-RNA) and a histology encoder (EVA-H) and fusing them with a multimodal transformer, EVA demonstrates strong, transferable performance across a 39-task benchmark spanning discovery, preclinical translation, and clinical prediction, with clear scaling laws up to 300M RNA parameters. Mechanistic interpretability via sparse autoencoders reveals interpretable, cross-species concepts that map to conserved immune programs and tissue differentiation. EVA’s zero-shot perturbation, cross-species transfer, and end-to-end translational evaluations address key translational barriers while providing an open-access EVA-RNA release to accelerate community research in immune-mediated diseases. The work highlights the value of modality- and species-aware pretraining for translational biology and sets a benchmarking paradigm aligned with drug development priorities.
Abstract
The effective application of foundation models to translational research in immune-mediated diseases requires multimodal patient-level representations that can capture complex phenotypes emerging from multicellular interactions. Yet most current biological foundation models focus only on single-cell resolution and are evaluated on technical metrics often disconnected from actual drug development tasks and challenges. Here, we introduce EVA, the first cross-species, multimodal foundation model of immunology and inflammation, a therapeutic area where shared pathogenic mechanisms create unique opportunities for transfer learning. EVA harmonizes transcriptomics data across species, platforms, and resolutions, and integrates histology data to produce rich, unified patient representations. We establish clear scaling laws, demonstrating that increasing model size and compute translates to improvements in both pretraining and downstream tasks performance. We introduce a comprehensive evaluation suite of 39 tasks spanning the drug development pipeline: zero-shot target efficacy and gene function prediction for discovery, cross-species or cross-diseases molecular perturbations for preclinical development, and patient stratification with treatment response prediction or disease activity prediction for clinical trials applications. We benchmark EVA against several state-of-the-art biological foundation models and baselines on these tasks, and demonstrate state-of-the-art results on each task category. Using mechanistic interpretability, we further identify biological meaningful features, revealing intertwined representations across species and technologies. We release an open version of EVA for transcriptomics to accelerate research on immune-mediated diseases.
