Table of Contents
Fetching ...

EVA-X: A Foundation Model for General Chest X-ray Analysis with Self-supervised Learning

Jingfeng Yao, Xinggang Wang, Yuehao Song, Huangxuan Zhao, Jun Ma, Yajie Chen, Wenyu Liu, Bo Wang

TL;DR

EVA-X introduces a chest X-ray foundation model trained with self-supervised learning that merges contrastive and mask-image modeling to capture semantic and geometric information from unlabeled X-ray data. With three sizes (Ti, S, B) and a frozen tokenizer, EVA-X achieves state-of-the-art transfer performance across 11 chest-disease tasks, including high data-efficiency examples such as 95% accuracy with 1% of COVID-19 training data. The model demonstrates strong classification, segmentation, and interpretability, evidenced by superior Grad-CAM localization and high Dice/Jaccard scores across multiple datasets. These results indicate EVA-X’s potential as a general, annotation-efficient medical foundation model for chest X-ray analysis and point toward broader applications in medical imaging with scalable self-supervised pre-training.

Abstract

The diagnosis and treatment of chest diseases play a crucial role in maintaining human health. X-ray examination has become the most common clinical examination means due to its efficiency and cost-effectiveness. Artificial intelligence analysis methods for chest X-ray images are limited by insufficient annotation data and varying levels of annotation, resulting in weak generalization ability and difficulty in clinical dissemination. Here we present EVA-X, an innovative foundational model based on X-ray images with broad applicability to various chest disease detection tasks. EVA-X is the first X-ray image based self-supervised learning method capable of capturing both semantic and geometric information from unlabeled images for universal X-ray image representation. Through extensive experimentation, EVA-X has demonstrated exceptional performance in chest disease analysis and localization, becoming the first model capable of spanning over 20 different chest diseases and achieving leading results in over 11 different detection tasks in the medical field. Additionally, EVA-X significantly reduces the burden of data annotation in the medical AI field, showcasing strong potential in the domain of few-shot learning. The emergence of EVA-X will greatly propel the development and application of foundational medical models, bringing about revolutionary changes in future medical research and clinical practice. Our codes and models are available at: https://github.com/hustvl/EVA-X.

EVA-X: A Foundation Model for General Chest X-ray Analysis with Self-supervised Learning

TL;DR

EVA-X introduces a chest X-ray foundation model trained with self-supervised learning that merges contrastive and mask-image modeling to capture semantic and geometric information from unlabeled X-ray data. With three sizes (Ti, S, B) and a frozen tokenizer, EVA-X achieves state-of-the-art transfer performance across 11 chest-disease tasks, including high data-efficiency examples such as 95% accuracy with 1% of COVID-19 training data. The model demonstrates strong classification, segmentation, and interpretability, evidenced by superior Grad-CAM localization and high Dice/Jaccard scores across multiple datasets. These results indicate EVA-X’s potential as a general, annotation-efficient medical foundation model for chest X-ray analysis and point toward broader applications in medical imaging with scalable self-supervised pre-training.

Abstract

The diagnosis and treatment of chest diseases play a crucial role in maintaining human health. X-ray examination has become the most common clinical examination means due to its efficiency and cost-effectiveness. Artificial intelligence analysis methods for chest X-ray images are limited by insufficient annotation data and varying levels of annotation, resulting in weak generalization ability and difficulty in clinical dissemination. Here we present EVA-X, an innovative foundational model based on X-ray images with broad applicability to various chest disease detection tasks. EVA-X is the first X-ray image based self-supervised learning method capable of capturing both semantic and geometric information from unlabeled images for universal X-ray image representation. Through extensive experimentation, EVA-X has demonstrated exceptional performance in chest disease analysis and localization, becoming the first model capable of spanning over 20 different chest diseases and achieving leading results in over 11 different detection tasks in the medical field. Additionally, EVA-X significantly reduces the burden of data annotation in the medical AI field, showcasing strong potential in the domain of few-shot learning. The emergence of EVA-X will greatly propel the development and application of foundational medical models, bringing about revolutionary changes in future medical research and clinical practice. Our codes and models are available at: https://github.com/hustvl/EVA-X.
Paper Structure (35 sections, 11 equations, 8 figures, 1 table)

This paper contains 35 sections, 11 equations, 8 figures, 1 table.

Figures (8)

  • Figure 1: (a) Pre-training Dataset: EVA-X pre-training collects and leverages a diverse set of X-ray images encompassing various health conditions. cxr14chexpertmimic(b) EVA-X Pre-training: EVA-X employs a novel self-supervised pre-training approach that synergistically integrates the strengths of contrastive learning mgcabiovilmedklipconvirtgloria and mask image modeling selfmedmaexiao2023delving. (c) General Visual Representations: EVA-X exhibits a high degree of transferability, enhancing the comprehensive analysis of X-ray imagery. (d) Transfer Performance: EVA-X demonstrates state-of-the-art performance across 11 distinct tasks cxr14chexperthemdan2020covidxzhang2023lungsiimRSNAjaeger2014two, outperforming established benchmarks set by previous pre-trained models.
  • Figure 2: Overall of EVA-X self-supervised pre-training. EVA-X designs a self-supervised pre-training method combining the advantages of contrastive learning and mask image modeling for Chest X-ray images. Please see Sec. \ref{['sec:method']} for details.
  • Figure 3: (a) Performance and Efficiency of EVA-X Pre-trained Models. Among all pre-trained models vitresnetdensenetmedklipmgcabiovilmedklipselfmedmaexiao2023delvingeva02, EVA-X-B achieves the highest performance. The EVA-X family demonstrates an excellent balance between performance and computational efficiency compared to previous methods. (b) Performance on Chest Diseases Classification. EVA-X achieves the best performance in both multi-label and single-label classification tasks for chest diseases cxr14chexperthemdan2020covidx. (c) Performance on Label-efficient Classification. EVA-X shows superior performance across varying amounts of training data, with a particularly notable advantage observed when dealing with extremely limited training data.
  • Figure 4: (a) Performance on Chest X-ray Segmentation. EVA-X surpasses 6 other pre-trained models resnetmedklipmgcadeitbiovilxiao2023delving across all segmentation benchmarks zhang2023lungRSNAsiimjaeger2014two, exhibiting superior performance on Dice and Jaccard metrics. (b) Visualization of Segmentation Results. EVA-X demonstrates enhanced accuracy and finer masks across all segmentation tasks.
  • Figure 5: (a) Performance on Weakly-Supervised Localization. EVA-X delivers the highest overall performance across all four metrics for weakly supervised localization tasks. cxr14(b) Visualization of Grad-CAM. Class Activation Map (CAM) gradcam is a significant interpretation method for deep learning models. It illustrates that EVA-X can localize diseases using only classification annotations, showcasing remarkable interpretability.
  • ...and 3 more figures