Vision-Language Model Based Multi-Expert Fusion for CT Image Classification

Jianfa Bai; Kejin Lu; Runtian Yuan; Qingqiu Li; Jilan Xu; Junlin Hou; Yuejie Zhang; Rui Feng

Vision-Language Model Based Multi-Expert Fusion for CT Image Classification

Jianfa Bai, Kejin Lu, Runtian Yuan, Qingqiu Li, Jilan Xu, Junlin Hou, Yuejie Zhang, Rui Feng

Abstract

Robust detection of COVID-19 from chest CT remains challenging in multi-institutional settings due to substantial source shift, source imbalance, and hidden test-source identities. In this work, we propose a three-stage source-aware multi-expert framework for multi-source COVID-19 CT classification. First, we build a lung-aware 3D expert by combining original CT volumes and lung-extracted CT volumes for volumetric classification. Second, we develop two MedSigLIP-based experts: a slice-wise representation and probability learning module, and a Transformer-based inter-slice context modeling module for capturing cross-slice dependency. Third, we train a source classifier to predict the latent source identity of each test scan. By leveraging the predicted source information, we perform model fusion and voting based on different experts. On the validation set covering all four sources, the Stage 1 model achieves the best macro-F1 of 0.9711, ACC of 0.9712, and AUC of 0.9791. Stage~2a and Stage~2b achieve the best AUC scores of 0.9864 and 0.9854, respectively. Stage~3 source classifier reaches 0.9107 ACC and 0.9114 F1. These results demonstrate that source-aware expert modeling and hierarchical voting provide an effective solution for robust COVID-19 CT classification under heterogeneous multi-source conditions.

Vision-Language Model Based Multi-Expert Fusion for CT Image Classification

Abstract

Paper Structure (16 sections, 5 equations, 1 figure, 5 tables)

This paper contains 16 sections, 5 equations, 1 figure, 5 tables.

Introduction
Methodology
Overview
Stage 1: Lung-Aware 3D Training for COVID-19 Classification
Stage 2a: Slice-Wise Representation and Probability Learning
Stage 2b: Transformer-Based Inter-Slice Context Modeling
Stage 3: Source Discrimination and Source-Specific Expert Inference
Datasets and Experiments
Datasets
Experiments
Stage 1: Lung-Aware 3D Classification
Stage 2a: Slice-Wise Representation and Probability Learning
Stage 2b: Transformer-Based Inter-Slice Context Modeling
Stage 3: Source Classification
Conclusion
...and 1 more sections

Figures (1)

Figure 1: Overview of the proposed three-stage source-aware multi-expert framework for multi-source COVID-19 CT classification. Stage 1 builds a lung-aware 3D expert for volumetric classification. Stage 2 introduces two MedSigLIP-based 2D experts for slice-wise probability learning and inter-slice context modeling. Stage 3 performs source discrimination and source-specific expert inference, where source 0 is handled by the 3D expert and source 1/2/3 are jointly inferred by multiple experts through fusion and voting.

Vision-Language Model Based Multi-Expert Fusion for CT Image Classification

Abstract

Vision-Language Model Based Multi-Expert Fusion for CT Image Classification

Authors

Abstract

Table of Contents

Figures (1)