Table of Contents
Fetching ...

MediFact at MEDIQA-M3G 2024: Medical Question Answering in Dermatology with Multimodal Learning

Nadia Saeed

TL;DR

The paper tackles open-ended, multilingual, multimodal dermatology QA under data scarcity by introducing MediFact-M3G, which jointly leverages weakly supervised image representations learned via VGG16-CNN-SVM and multilingual text with multimodal QA fusion. It integrates extractive and abstractive QA models, enriched with image features, and uses CLIP-based contrastive learning for robust response selection, including cross-language translation. The approach demonstrates competitive performance on the MEDIQA-M3G 2024 shared task across English, Chinese, and Spanish, illustrating the viability of multilingual, multimodal QA for teledermatology. This work advances clinical decision support by enabling open-ended, cross-language dermatology QA and highlights directions toward larger-scale datasets, ontologies, and real-world validation.

Abstract

The MEDIQA-M3G 2024 challenge necessitates novel solutions for Multilingual & Multimodal Medical Answer Generation in dermatology (wai Yim et al., 2024a). This paper addresses the limitations of traditional methods by proposing a weakly supervised learning approach for open-ended medical question-answering (QA). Our system leverages readily available MEDIQA-M3G images via a VGG16-CNN-SVM model, enabling multilingual (English, Chinese, Spanish) learning of informative skin condition representations. Using pre-trained QA models, we further bridge the gap between visual and textual information through multimodal fusion. This approach tackles complex, open-ended questions even without predefined answer choices. We empower the generation of comprehensive answers by feeding the ViT-CLIP model with multiple responses alongside images. This work advances medical QA research, paving the way for clinical decision support systems and ultimately improving healthcare delivery.

MediFact at MEDIQA-M3G 2024: Medical Question Answering in Dermatology with Multimodal Learning

TL;DR

The paper tackles open-ended, multilingual, multimodal dermatology QA under data scarcity by introducing MediFact-M3G, which jointly leverages weakly supervised image representations learned via VGG16-CNN-SVM and multilingual text with multimodal QA fusion. It integrates extractive and abstractive QA models, enriched with image features, and uses CLIP-based contrastive learning for robust response selection, including cross-language translation. The approach demonstrates competitive performance on the MEDIQA-M3G 2024 shared task across English, Chinese, and Spanish, illustrating the viability of multilingual, multimodal QA for teledermatology. This work advances clinical decision support by enabling open-ended, cross-language dermatology QA and highlights directions toward larger-scale datasets, ontologies, and real-world validation.

Abstract

The MEDIQA-M3G 2024 challenge necessitates novel solutions for Multilingual & Multimodal Medical Answer Generation in dermatology (wai Yim et al., 2024a). This paper addresses the limitations of traditional methods by proposing a weakly supervised learning approach for open-ended medical question-answering (QA). Our system leverages readily available MEDIQA-M3G images via a VGG16-CNN-SVM model, enabling multilingual (English, Chinese, Spanish) learning of informative skin condition representations. Using pre-trained QA models, we further bridge the gap between visual and textual information through multimodal fusion. This approach tackles complex, open-ended questions even without predefined answer choices. We empower the generation of comprehensive answers by feeding the ViT-CLIP model with multiple responses alongside images. This work advances medical QA research, paving the way for clinical decision support systems and ultimately improving healthcare delivery.
Paper Structure (14 sections, 2 figures, 2 tables)