GlobalizeEd: A Multimodal Translation System that Preserves Speaker Identity in Academic Lectures

Hoang-Son Vo; Karina Kolmogortseva; Ngumimi Karen Iyortsuun; Hong-Duyen Vo; Soo-Hyung Kim

GlobalizeEd: A Multimodal Translation System that Preserves Speaker Identity in Academic Lectures

Hoang-Son Vo, Karina Kolmogortseva, Ngumimi Karen Iyortsuun, Hong-Duyen Vo, Soo-Hyung Kim

TL;DR

GlobalizeEd tackles the challenge of translating academic lectures while preserving speaker identity and enabling tone-appropriate culture-aware delivery. The authors implement a multimodal pipeline that combines segment-wise LLM-driven translation with accent-preserving voice cloning and diffusion-based lip synchronization, augmented by duration alignment and a user-centric interface. In a mixed-methods study with 18 instructors and 18 students, the system reduces cognitive load and enhances engagement compared to traditional subtitles, while maintaining learning effectiveness similar to high-quality subtitles. The work provides a practical, user-centered AI framework for cross-cultural education, demonstrating how linguistic fidelity, cultural adaptability, and user control can yield more inclusive global learning experiences.

Abstract

A large amount of valuable academic content is only available in its original language, creating a significant access barrier for the global student community. This is a challenge for translating in several subjects, such as history, culture, and the arts, where current automated subtitle tools fail to convey the appropriate pedagogical tone and specialized meaning. In addition, reading traditional automated subtitles increases cognitive load and leads to a disconnected learning experience. Through a mixed-methods study involving 36 participants, we found that GlobalizeEds dubbed formats significantly reduce cognitive load and offer a more immersive learning experience compared to traditional subtitles. Although learning effectiveness was comparable between high-quality subtitles and dubbed formats, both groups valued GlobalizeEds ability to preserve the speakers voice, which enhanced perceived authenticity. Instructors rated translation accuracy and vocal naturalness, whereas students reported that synchronized, identity-preserving outputs fostered engagement and trust. This work contributes a novel human-centered AI framework for cross-lingual education, demonstrating how multimodal translation systems can balance linguistic fidelity, cultural adaptability, and user control to create more inclusive global learning experiences.

GlobalizeEd: A Multimodal Translation System that Preserves Speaker Identity in Academic Lectures

TL;DR

Abstract

GlobalizeEd: A Multimodal Translation System that Preserves Speaker Identity in Academic Lectures

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (7)