OmniRad: A Radiological Foundation Model for Multi-Task Medical Image Analysis
Luca Zedda, Andrea Loddo, Cecilia Di Ruberto
TL;DR
OmniRad addresses the need for unified, transferable visual representations in radiology by pretraining a single radiological encoder with self-supervision on heterogeneous data and reusing it across classification and segmentation tasks, with exploratory tests for image captioning. The approach combines a radiomics-informed, stable representation with lightweight, task-specific adapters and a lightweight segmentation decoder to preserve efficiency. Empirical results show OmniRad achieving state-of-the-art or competitive performance across MedMNIST, MedSegBench, and ROCOv2 benchmarks, with consistent gains especially on anatomically diverse and multi-modal datasets, and qualitatively favorable latent-space structure. This work suggests a practical path toward a unified radiological foundation that supports multi-task pipelines in real-world clinical settings, reducing task-specific retraining while maintaining robust performance.
Abstract
Radiological analysis increasingly benefits from pretrained visual representations that can support heterogeneous downstream tasks across imaging modalities. In this work, we introduce OmniRad, a self-supervised radiological foundation model pretrained on 1.2 million medical images, designed with radiology-inspired principles emphasizing representation reuse and cross-task transferability. We evaluate the pretrained encoder under multiple downstream adaptation regimes, including lightweight task-specific adapters with a frozen backbone as well as full end-to-end fine-tuning for classification, allowing us to assess both representation quality and task-specific performance. OmniRad is evaluated on a broad suite of public benchmarks spanning classification and segmentation across multiple modalities. On the MedMNISTv2 collection, OmniRad improves classification F1 by up to 2.05% over competing foundation models. For dense prediction, OmniRad attains mean Dice score improvements across six MedSegBench datasets when using frozen representations. Qualitative analyses and latent-space visualizations suggest improved feature clustering and modality-related separation.
