Table of Contents
Fetching ...

GigaPevt: Multimodal Medical Assistant

Pavel Blinov, Konstantin Egorov, Ivan Sviridov, Nikolay Ivanov, Stepan Botman, Evgeniy Tagin, Stepan Kudin, Galina Zubkova, Andrey Savchenko

TL;DR

Addresses the challenge of multimodal patient understanding in medical AI by introducing GigaPevt, a multimodal medical assistant that fuses LLM dialog with specialized medical models across visual, audio, and text cues. Uses a client-server architecture with a Python frontend for low-latency modules and a Flask backend featuring a model manager, specialized models, and GP Dialog Logic, augmented by Retrieval-Augmented Generation (RAG) and Chain-of-Thought (CoT) prompting. On RuMed benchmarks, it achieves a $1.18\%$ accuracy improvement on RuMedDaNet over a GigaChat baseline and a $0.38\%$ gain on RuMedNLI, highlighting the benefits of multimodal context for dialogue quality. Qualitative dialog showcases demonstrate more precise, grounded responses when multimodal data are used, indicating potential for closer integration with patient records and future knowledge-management improvements.

Abstract

Building an intelligent and efficient medical assistant is still a challenging AI problem. The major limitation comes from the data modality scarceness, which reduces comprehensive patient perception. This demo paper presents the GigaPevt, the first multimodal medical assistant that combines the dialog capabilities of large language models with specialized medical models. Such an approach shows immediate advantages in dialog quality and metric performance, with a 1.18% accuracy improvement in the question-answering task.

GigaPevt: Multimodal Medical Assistant

TL;DR

Addresses the challenge of multimodal patient understanding in medical AI by introducing GigaPevt, a multimodal medical assistant that fuses LLM dialog with specialized medical models across visual, audio, and text cues. Uses a client-server architecture with a Python frontend for low-latency modules and a Flask backend featuring a model manager, specialized models, and GP Dialog Logic, augmented by Retrieval-Augmented Generation (RAG) and Chain-of-Thought (CoT) prompting. On RuMed benchmarks, it achieves a accuracy improvement on RuMedDaNet over a GigaChat baseline and a gain on RuMedNLI, highlighting the benefits of multimodal context for dialogue quality. Qualitative dialog showcases demonstrate more precise, grounded responses when multimodal data are used, indicating potential for closer integration with patient records and future knowledge-management improvements.

Abstract

Building an intelligent and efficient medical assistant is still a challenging AI problem. The major limitation comes from the data modality scarceness, which reduces comprehensive patient perception. This demo paper presents the GigaPevt, the first multimodal medical assistant that combines the dialog capabilities of large language models with specialized medical models. Such an approach shows immediate advantages in dialog quality and metric performance, with a 1.18% accuracy improvement in the question-answering task.
Paper Structure (14 sections, 2 figures, 2 tables)