Democratizing MLLMs in Healthcare: TinyLLaVA-Med for Efficient Healthcare Diagnostics in Resource-Constrained Settings

Aya El Mir; Lukelo Thadei Luoga; Boyuan Chen; Muhammad Abdullah Hanif; Muhammad Shafique

Democratizing MLLMs in Healthcare: TinyLLaVA-Med for Efficient Healthcare Diagnostics in Resource-Constrained Settings

Aya El Mir, Lukelo Thadei Luoga, Boyuan Chen, Muhammad Abdullah Hanif, Muhammad Shafique

TL;DR

This paper introduces an optimization method for the general-purpose MLLM, TinyLLaVA, which is adapted and renamed TinyLLaVA-Med, and achieves deployment viability in hardware-constrained environments with low computational resources, maintaining essential functionalities and delivering accuracies close to state-of-the-art models.

Abstract

Deploying Multi-Modal Large Language Models (MLLMs) in healthcare is hindered by their high computational demands and significant memory requirements, which are particularly challenging for resource-constrained devices like the Nvidia Jetson Xavier. This problem is particularly evident in remote medical settings where advanced diagnostics are needed but resources are limited. In this paper, we introduce an optimization method for the general-purpose MLLM, TinyLLaVA, which we have adapted and renamed TinyLLaVA-Med. This adaptation involves instruction-tuning and fine-tuning TinyLLaVA on a medical dataset by drawing inspiration from the LLaVA-Med training pipeline. Our approach successfully minimizes computational complexity and power consumption, with TinyLLaVA-Med operating at 18.9W and using 11.9GB of memory, while achieving accuracies of 64.54% on VQA-RAD and 70.70% on SLAKE for closed-ended questions. Therefore, TinyLLaVA-Med achieves deployment viability in hardware-constrained environments with low computational resources, maintaining essential functionalities and delivering accuracies close to state-of-the-art models.

Democratizing MLLMs in Healthcare: TinyLLaVA-Med for Efficient Healthcare Diagnostics in Resource-Constrained Settings

TL;DR

Abstract

Paper Structure (23 sections, 5 figures, 2 tables)

This paper contains 23 sections, 5 figures, 2 tables.

INTRODUCTION
BACKGROUND AND RELATED WORK
Multimodal Large Language Models in Healthcare
TinyLLaVA
METHODOLOGY
Instruction-Tuning
Fine-tuning to Downstream Datasets
Deployment on Embedded Device
RESULTS
Datasets
Evaluation Metrics for TinyLLaVA-Med
Medical Capability Metrics
Hardware Deployment Evaluation Metrics
GPU Utilization Efficiency
Power Efficiency
...and 8 more sections

Figures (5)

Figure 1: TinyLLaVA Architecture zhou2024tinyllava
Figure 2: Flowchart illustrating the methodology of adapting TinyLLaVA into TinyLLaVA-Med for deployment on embedded devices.
Figure 3: Training loss of TinyLLaVA-Med on PMC-15M dataset over epochs, indicating effective learning and model convergence during the instruction tuning stage.
Figure 4: Hardware setup of the TinyLLaVA-Med model on NVIDIA Jetson Xavier, demonstrating the model's deployment and integration into a real-world medical environment.
Figure 5: Close-up of the TinyLLaVA-Med chat interface deployed on NVIDIA Jetson Xavier, facilitating real-time medical image analysis.

Democratizing MLLMs in Healthcare: TinyLLaVA-Med for Efficient Healthcare Diagnostics in Resource-Constrained Settings

TL;DR

Abstract

Democratizing MLLMs in Healthcare: TinyLLaVA-Med for Efficient Healthcare Diagnostics in Resource-Constrained Settings

Authors

TL;DR

Abstract

Table of Contents

Figures (5)