GigaPevt: Multimodal Medical Assistant

Pavel Blinov; Konstantin Egorov; Ivan Sviridov; Nikolay Ivanov; Stepan Botman; Evgeniy Tagin; Stepan Kudin; Galina Zubkova; Andrey Savchenko

GigaPevt: Multimodal Medical Assistant

Pavel Blinov, Konstantin Egorov, Ivan Sviridov, Nikolay Ivanov, Stepan Botman, Evgeniy Tagin, Stepan Kudin, Galina Zubkova, Andrey Savchenko

TL;DR

Addresses the challenge of multimodal patient understanding in medical AI by introducing GigaPevt, a multimodal medical assistant that fuses LLM dialog with specialized medical models across visual, audio, and text cues. Uses a client-server architecture with a Python frontend for low-latency modules and a Flask backend featuring a model manager, specialized models, and GP Dialog Logic, augmented by Retrieval-Augmented Generation (RAG) and Chain-of-Thought (CoT) prompting. On RuMed benchmarks, it achieves a $1.18\%$ accuracy improvement on RuMedDaNet over a GigaChat baseline and a $0.38\%$ gain on RuMedNLI, highlighting the benefits of multimodal context for dialogue quality. Qualitative dialog showcases demonstrate more precise, grounded responses when multimodal data are used, indicating potential for closer integration with patient records and future knowledge-management improvements.

Abstract

Building an intelligent and efficient medical assistant is still a challenging AI problem. The major limitation comes from the data modality scarceness, which reduces comprehensive patient perception. This demo paper presents the GigaPevt, the first multimodal medical assistant that combines the dialog capabilities of large language models with specialized medical models. Such an approach shows immediate advantages in dialog quality and metric performance, with a 1.18% accuracy improvement in the question-answering task.

GigaPevt: Multimodal Medical Assistant

TL;DR

accuracy improvement on RuMedDaNet over a GigaChat baseline and a

gain on RuMedNLI, highlighting the benefits of multimodal context for dialogue quality. Qualitative dialog showcases demonstrate more precise, grounded responses when multimodal data are used, indicating potential for closer integration with patient records and future knowledge-management improvements.

Abstract

Paper Structure (14 sections, 2 figures, 2 tables)

This paper contains 14 sections, 2 figures, 2 tables.

Introduction
GigaPevt architecture
Specialized models
Video-based Facial Analytics
User Identification
Socio-Demographic Model
Facial Expression Recognition
Body Mass Index Model
rPPG Model
GP Dialog Logic
Experiments
Performance Evaluation
Dialog Showcases
Conclusion and Future Work

Figures (2)

Figure 1: GigaPevt UI
Figure 2: GigaPevt architecture

GigaPevt: Multimodal Medical Assistant

TL;DR

Abstract

GigaPevt: Multimodal Medical Assistant

Authors

TL;DR

Abstract

Table of Contents

Figures (2)