Table of Contents
Fetching ...

MedPrompt: LLM-CNN Fusion with Weight Routing for Medical Image Segmentation and Classification

Shadman Sobhan, Kazi Abrar Mahmud, Abduz Zami

TL;DR

MedPrompt addresses the fragmentation of medical image analysis by unifying high-level natural-language task planning with a modular CNN through dynamic weight routing. It leverages a few-shot prompted LLM (Llama-4-17B) to decompose prompts into structured tasks and to select task-specific pretrained weights for DeepFusionLab, enabling scalable expansion without full-model retraining. Evaluated across 19 datasets and 12 tasks, the approach achieves near real-time performance (average latency ≈2.4–2.5 s) and high end-to-end correctness (≈99%), with Dice scores around 0.985 on lungs and strong F1 scores in classification. The combination of instruction-driven planning and a single shared backbone supports flexible, prompt-driven medical imaging workflows with potential extension to 3D modalities and broader clinical tasks.

Abstract

Current medical image analysis systems are typically task-specific, requiring separate models for classification and segmentation, and lack the flexibility to support user-defined workflows. To address these challenges, we introduce MedPrompt, a unified framework that combines a few-shot prompted Large Language Model (Llama-4-17B) for high-level task planning with a modular Convolutional Neural Network (DeepFusionLab) for low-level image processing. The LLM interprets user instructions and generates structured output to dynamically route task-specific pretrained weights. This weight routing approach avoids retraining the entire framework when adding new tasks-only task-specific weights are required, enhancing scalability and deployment. We evaluated MedPrompt across 19 public datasets, covering 12 tasks spanning 5 imaging modalities. The system achieves a 97% end-to-end correctness in interpreting and executing prompt-driven instructions, with an average inference latency of 2.5 seconds, making it suitable for near real-time applications. DeepFusionLab achieves competitive segmentation accuracy (e.g., Dice 0.9856 on lungs) and strong classification performance (F1 0.9744 on tuberculosis). Overall, MedPrompt enables scalable, prompt-driven medical imaging by combining the interpretability of LLMs with the efficiency of modular CNNs.

MedPrompt: LLM-CNN Fusion with Weight Routing for Medical Image Segmentation and Classification

TL;DR

MedPrompt addresses the fragmentation of medical image analysis by unifying high-level natural-language task planning with a modular CNN through dynamic weight routing. It leverages a few-shot prompted LLM (Llama-4-17B) to decompose prompts into structured tasks and to select task-specific pretrained weights for DeepFusionLab, enabling scalable expansion without full-model retraining. Evaluated across 19 datasets and 12 tasks, the approach achieves near real-time performance (average latency ≈2.4–2.5 s) and high end-to-end correctness (≈99%), with Dice scores around 0.985 on lungs and strong F1 scores in classification. The combination of instruction-driven planning and a single shared backbone supports flexible, prompt-driven medical imaging workflows with potential extension to 3D modalities and broader clinical tasks.

Abstract

Current medical image analysis systems are typically task-specific, requiring separate models for classification and segmentation, and lack the flexibility to support user-defined workflows. To address these challenges, we introduce MedPrompt, a unified framework that combines a few-shot prompted Large Language Model (Llama-4-17B) for high-level task planning with a modular Convolutional Neural Network (DeepFusionLab) for low-level image processing. The LLM interprets user instructions and generates structured output to dynamically route task-specific pretrained weights. This weight routing approach avoids retraining the entire framework when adding new tasks-only task-specific weights are required, enhancing scalability and deployment. We evaluated MedPrompt across 19 public datasets, covering 12 tasks spanning 5 imaging modalities. The system achieves a 97% end-to-end correctness in interpreting and executing prompt-driven instructions, with an average inference latency of 2.5 seconds, making it suitable for near real-time applications. DeepFusionLab achieves competitive segmentation accuracy (e.g., Dice 0.9856 on lungs) and strong classification performance (F1 0.9744 on tuberculosis). Overall, MedPrompt enables scalable, prompt-driven medical imaging by combining the interpretability of LLMs with the efficiency of modular CNNs.

Paper Structure

This paper contains 33 sections, 2 equations, 9 figures, 6 tables.

Figures (9)

  • Figure 1: MedPrompt Architecture
  • Figure 2: Text Processing and Structured Output Generation
  • Figure 3: Architecture of DeepFusionLab
  • Figure 4: Atrous Spatial Pyramid Pooling Block Architecture
  • Figure 5: Multi Feature Fusion Block Architecture
  • ...and 4 more figures