VOICE: Visual Oracle for Interaction, Conversation, and Explanation
Donggang Jia, Alexandra Irger, Lonni Besancon, Ondrej Strnad, Deng Luo, Johanna Bjorklund, Anders Ynnerman, Ivan Viola
TL;DR
VOICE bridges large language models and real-time 3D molecular visualization to enable natural, voice-driven explanations of complex biological structures for lay audiences. It introduces a pack-of-bots dialogue system and a scene-tree–based interactive text-to-visualization pipeline to generate coherent visual narrations aligned with user queries. Evaluation with science-communication experts shows low latency, accurate content, and valuable feedback on guidance and representation, indicating strong potential for autonomous science communication in public centers. Future work will pursue unified instruction-extraction models, deeper integration of visual state into dialogue, and dynamic animations to further enhance learning outcomes.
Abstract
We present VOICE, a novel approach to science communication that connects large language models' (LLM) conversational capabilities with interactive exploratory visualization. VOICE introduces several innovative technical contributions that drive our conversational visualization framework. Our foundation is a pack-of-bots that can perform specific tasks, such as assigning tasks, extracting instructions, and generating coherent content. We employ fine-tuning and prompt engineering techniques to tailor bots' performance to their specific roles and accurately respond to user queries. Our interactive text-to-visualization method generates a flythrough sequence matching the content explanation. Besides, natural language interaction provides capabilities to navigate and manipulate the 3D models in real-time. The VOICE framework can receive arbitrary voice commands from the user and respond verbally, tightly coupled with corresponding visual representation with low latency and high accuracy. We demonstrate the effectiveness of our approach by applying it to the molecular visualization domain: analyzing three 3D molecular models with multi-scale and multi-instance attributes. We finally evaluate VOICE with the identified educational experts to show the potential of our approach. All supplemental materials are available at https://osf.io/g7fbr.
