BioVERSE: Representation Alignment of Biomedical Modalities to LLMs for Multi-Modal Reasoning
Ching-Huei Tsou, Michal Ozery-Flato, Ella Barkan, Diwakar Mahajan, Ben Shapira
TL;DR
BioVERSE addresses the challenge of siloed biomedical embeddings by introducing a modular encoder–projector–LLM framework that projects modality-specific BioFM embeddings into the LLM’s token space and treats them as special tokens for joint reasoning. It implements a two-stage training scheme—alignment (autoregressive or contrastive) followed by light instruction tuning with LoRA—to enable zero-shot cross-modal tasks across scRNA-seq, proteins, and small molecules. Across cell-type annotation, molecular description, and protein-oriented text generation, BioVERSE with compact backbones matches or surpasses larger text-only baselines while providing richer, explainable outputs and maintaining deployment practicality. The approach is modular and extensible, enabling on-prem deployment and future expansion to additional modalities and backbones, with open-sourcing to foster community benchmarking and advancement in embedding-aware biomedical reasoning.
Abstract
Recent advances in large language models (LLMs) and biomedical foundation models (BioFMs) have achieved strong results in biological text reasoning, molecular modeling, and single-cell analysis, yet they remain siloed in disjoint embedding spaces, limiting cross-modal reasoning. We present BIOVERSE (Biomedical Vector Embedding Realignment for Semantic Engagement), a two-stage approach that adapts pretrained BioFMs as modality encoders and aligns them with LLMs through lightweight, modality-specific projection layers. The approach first aligns each modality to a shared LLM space through independently trained projections, allowing them to interoperate naturally, and then applies standard instruction tuning with multi-modal data to bring them together for downstream reasoning. By unifying raw biomedical data with knowledge embedded in LLMs, the approach enables zero-shot annotation, cross-modal question answering, and interactive, explainable dialogue. Across tasks spanning cell-type annotation, molecular description, and protein function reasoning, compact BIOVERSE configurations surpass larger LLM baselines while enabling richer, generative outputs than existing BioFMs, establishing a foundation for principled multi-modal biomedical reasoning.
