Table of Contents
Fetching ...

Robust COVID-19 Detection in CT Images with CLIP

Li Lin, Yamini Sri Krubha, Zhenhuan Yang, Cheng Ren, Thuc Duy Le, Irene Amerini, Xin Wang, Shu Hu

TL;DR

This work introduces the first lightweight detector designed to overcome obstacles in medical imaging, leveraging a frozen CLIP image encoder and a trainable multilayer perception (MLP) to achieve superior performance despite the inherent data limitations.

Abstract

In the realm of medical imaging, particularly for COVID-19 detection, deep learning models face substantial challenges such as the necessity for extensive computational resources, the paucity of well-annotated datasets, and a significant amount of unlabeled data. In this work, we introduce the first lightweight detector designed to overcome these obstacles, leveraging a frozen CLIP image encoder and a trainable multilayer perception (MLP). Enhanced with Conditional Value at Risk (CVaR) for robustness and a loss landscape flattening strategy for improved generalization, our model is tailored for high efficacy in COVID-19 detection. Furthermore, we integrate a teacher-student framework to capitalize on the vast amounts of unlabeled data, enabling our model to achieve superior performance despite the inherent data limitations. Experimental results on the COV19-CT-DB dataset demonstrate the effectiveness of our approach, surpassing baseline by up to 10.6% in `macro' F1 score in supervised learning. The code is available at https://github.com/Purdue-M2/COVID-19_Detection_M2_PURDUE.

Robust COVID-19 Detection in CT Images with CLIP

TL;DR

This work introduces the first lightweight detector designed to overcome obstacles in medical imaging, leveraging a frozen CLIP image encoder and a trainable multilayer perception (MLP) to achieve superior performance despite the inherent data limitations.

Abstract

In the realm of medical imaging, particularly for COVID-19 detection, deep learning models face substantial challenges such as the necessity for extensive computational resources, the paucity of well-annotated datasets, and a significant amount of unlabeled data. In this work, we introduce the first lightweight detector designed to overcome these obstacles, leveraging a frozen CLIP image encoder and a trainable multilayer perception (MLP). Enhanced with Conditional Value at Risk (CVaR) for robustness and a loss landscape flattening strategy for improved generalization, our model is tailored for high efficacy in COVID-19 detection. Furthermore, we integrate a teacher-student framework to capitalize on the vast amounts of unlabeled data, enabling our model to achieve superior performance despite the inherent data limitations. Experimental results on the COV19-CT-DB dataset demonstrate the effectiveness of our approach, surpassing baseline by up to 10.6% in `macro' F1 score in supervised learning. The code is available at https://github.com/Purdue-M2/COVID-19_Detection_M2_PURDUE.
Paper Structure (18 sections, 3 equations, 5 figures, 2 tables, 1 algorithm)

This paper contains 18 sections, 3 equations, 5 figures, 2 tables, 1 algorithm.

Figures (5)

  • Figure 1: Comparison between our method with traditional method. First row: The traditional method trains a whole deep learning model ( e.g., CNN) with a binary cross-entropy loss $\mathcal{L}_{BCE}$. Second row: Our method enhances COVID-19 detection by unitizing a frozen CLIP and a lightweight MLP classifier with Conditional Value at Risk (CVaR) loss $\mathcal{L}_{CVaR}$ across a flattened loss landscape.
  • Figure 2: Overview of our proposed model using CLIP ViT for encoding the input images, an MLP module with robust CVaR loss, and an optimization step involving a flattened loss landscape for detecting COVID-19 cases apart from NON-COVID-19.
  • Figure 3: Diagrammatic representation of our robust model with teacher-student framework by leveraging unlabeled data for enhancing detection performance.
  • Figure 4: The loss landscape visualization of our proposed method without (left) and with (right) using the sharpness-aware minimization (SAM) method. The axis's scales are the same for both figures.
  • Figure 5: 'Macro' F1 score to different $\alpha$ values.