Advancing human-centric AI for robust X-ray analysis through holistic self-supervised learning
Théo Moutakanni, Piotr Bojanowski, Guillaume Chassagnon, Céline Hudelot, Armand Joulin, Yann LeCun, Matthew Muckley, Maxime Oquab, Marie-Pierre Revel, Maria Vakalopoulou
TL;DR
RayDINO addresses the need for robust, fair, and holistic chest X-ray analysis using a large self-supervised vision transformer trained on 873k images. By freezing the 307M-parameter backbone and training lightweight task adapters, the approach delivers state-of-the-art performance across 21 benchmarks spanning classification ($AUROC$), segmentation ($mDice$), and radiology report generation, while enabling strong out-of-domain generalization and bias auditing. The work highlights the advantages of self-supervised pretraining for patient-centric AI, offering interpretable attention maps and consistent performance on unseen populations and new diseases like COVID-19. Its demonstrated cross-population generalization, fairness analysis, and clinical applicability suggest significant potential for scalable radiology support in diverse settings, including low-resource regions and nonstandard exam distributions. Overall, RayDINO advances robust, versatile radiology AI by combining holistic imaging representations with minimal task-specific supervision and explicit interpretability.
Abstract
AI Foundation models are gaining traction in various applications, including medical fields like radiology. However, medical foundation models are often tested on limited tasks, leaving their generalisability and biases unexplored. We present RayDINO, a large visual encoder trained by self-supervision on 873k chest X-rays. We compare RayDINO to previous state-of-the-art models across nine radiology tasks, from classification and dense segmentation to text generation, and provide an in depth analysis of population, age and sex biases of our model. Our findings suggest that self-supervision allows patient-centric AI proving useful in clinical workflows and interpreting X-rays holistically. With RayDINO and small task-specific adapters, we reach state-of-the-art results and improve generalization to unseen populations while mitigating bias, illustrating the true promise of foundation models: versatility and robustness.
