Table of Contents
Fetching ...

MedMimic: Physician-Inspired Multimodal Fusion for Early Diagnosis of Fever of Unknown Origin

Minrui Chen, Yi Zhou, Huidong Jiang, Yuhan Zhu, Guanjie Zou, Minqi Chen, Rong Tian, Hiroto Saigo

TL;DR

Fever of unknown origin (FUO) presents a persistent diagnostic challenge due to diverse etiologies and high-dimensional data. MedMimic addresses this by fusing $^{18}$F-FDG PET/CT imaging features—extracted via pretrained models (PCA, ResNet-18, ViT, DINOv2)—with standardized clinical data through a learnable self-attention-based multimodal fusion network (MFCN). In a retrospective study of 416 FUO cases, the method achieves macro-AUROC values ranging from 0.8654 to 0.9291 across seven diagnostic tasks, outperforming traditional ML and single-modality DL baselines. Ablation studies and 5-fold cross-validation support the robustness and effectiveness of the approach, highlighting the benefit of Transformer-based features and end-to-end multimodal integration for early, accurate FUO classification. The work demonstrates the potential of combining pretrained large-model representations with clinician-inspired fusion to enhance diagnostic decision support in complex, multimodal clinical problems.

Abstract

Fever of unknown origin FUO remains a diagnostic challenge. MedMimic is introduced as a multimodal framework inspired by real-world diagnostic processes. It uses pretrained models such as DINOv2, Vision Transformer, and ResNet-18 to convert high-dimensional 18F-FDG PET/CT imaging into low-dimensional, semantically meaningful features. A learnable self-attention-based fusion network then integrates these imaging features with clinical data for classification. Using 416 FUO patient cases from Sichuan University West China Hospital from 2017 to 2023, the multimodal fusion classification network MFCN achieved macro-AUROC scores ranging from 0.8654 to 0.9291 across seven tasks, outperforming conventional machine learning and single-modality deep learning methods. Ablation studies and five-fold cross-validation further validated its effectiveness. By combining the strengths of pretrained large models and deep learning, MedMimic offers a promising solution for disease classification.

MedMimic: Physician-Inspired Multimodal Fusion for Early Diagnosis of Fever of Unknown Origin

TL;DR

Fever of unknown origin (FUO) presents a persistent diagnostic challenge due to diverse etiologies and high-dimensional data. MedMimic addresses this by fusing F-FDG PET/CT imaging features—extracted via pretrained models (PCA, ResNet-18, ViT, DINOv2)—with standardized clinical data through a learnable self-attention-based multimodal fusion network (MFCN). In a retrospective study of 416 FUO cases, the method achieves macro-AUROC values ranging from 0.8654 to 0.9291 across seven diagnostic tasks, outperforming traditional ML and single-modality DL baselines. Ablation studies and 5-fold cross-validation support the robustness and effectiveness of the approach, highlighting the benefit of Transformer-based features and end-to-end multimodal integration for early, accurate FUO classification. The work demonstrates the potential of combining pretrained large-model representations with clinician-inspired fusion to enhance diagnostic decision support in complex, multimodal clinical problems.

Abstract

Fever of unknown origin FUO remains a diagnostic challenge. MedMimic is introduced as a multimodal framework inspired by real-world diagnostic processes. It uses pretrained models such as DINOv2, Vision Transformer, and ResNet-18 to convert high-dimensional 18F-FDG PET/CT imaging into low-dimensional, semantically meaningful features. A learnable self-attention-based fusion network then integrates these imaging features with clinical data for classification. Using 416 FUO patient cases from Sichuan University West China Hospital from 2017 to 2023, the multimodal fusion classification network MFCN achieved macro-AUROC scores ranging from 0.8654 to 0.9291 across seven tasks, outperforming conventional machine learning and single-modality deep learning methods. Ablation studies and five-fold cross-validation further validated its effectiveness. By combining the strengths of pretrained large models and deep learning, MedMimic offers a promising solution for disease classification.

Paper Structure

This paper contains 32 sections, 4 equations, 20 figures, 10 tables, 3 algorithms.

Figures (20)

  • Figure 1: A diagnostic process of the proposed MedMimic framework.
  • Figure 2: Flowchart of the inclusion and exclusion criteria.
  • Figure 3: The progress of clinical data preparation. Each patient $i$ is represented by a CT image set $\mathcal{N}_i$, a PET image set $\mathcal{M}_i$, a clinical feature vector $\mathbf{a}_i$, and a one-hot label $y_i$.
  • Figure 4: The progress of image feature extraction. For each model $k$, slice features are extracted using PCA, ResNet-18, ViT, and DINOv2. For a patient $i$ with $N_i$ CT slices, the features are stacked into $\mathbf{F}^{CT}_{ki}$. Zero-padding extends slices to $N_{\max} = \max_i\{N_i\}$, forming $\mathcal{F}_k^{\text{CT}}$. PET slices are processed similarly to yield $\mathcal{F}_k^{\text{PET}}$.
  • Figure 5: The progress of PCA based feature extraction. Each CT slice $\mathbf{X}^{CT}$ is flattened into a vector and organized into a data matrix. After subtracting the mean at each pixel, PCA is applied via SVD to retain the top $b_1$ eigenvectors, reducing the dimensionality of the data.
  • ...and 15 more figures