Table of Contents
Fetching ...

Dynamik: Syntactically-Driven Dynamic Font Sizing for Emphasis of Key Information

Naoto Nishida, Yoshio Ishiguro, Jun Rekiomto, Naomi Yamashita

TL;DR

Dynamik tackles cognitive load in subtitle reading for non-native speakers by dynamically sizing keywords using a simple linguistic criterion that separates content words from function words. Implemented in a Unity-based real-time system with Azure Speech recognition and spaCy morphology/POS tagging, it supports three subtitle modes and is evaluated through crowd-sourced testing with 84 non-native English speakers across CNN news clips. Results indicate that Dynamik reduces mental workload and improves perceived comprehension among lower-English-proficiency participants, while producing no robust differences for native speakers. The approach offers practical potential to reduce subtitle display area and adapt to other languages, while inviting further work on alternative cues, refined keyword extraction, and latency optimization.

Abstract

In today's globalized world, there are increasing opportunities for individuals to communicate using a common non-native language (lingua franca). Non-native speakers often have opportunities to listen to foreign languages, but may not comprehend them as fully as native speakers do. To aid real-time comprehension, live transcription of subtitles is frequently used in everyday life (e.g., during Zoom conversations, watching YouTube videos, or on social networking sites). However, simultaneously reading subtitles while listening can increase cognitive load. In this study, we propose Dynamik, a system that reduces cognitive load during reading by decreasing the size of less important words and enlarging important ones, thereby enhancing sentence contrast. Our results indicate that Dynamik can reduce certain aspects of cognitive load, specifically, participants' perceived performance and effort among individuals with low proficiency in English, as well as enhance the users' sense of comprehension, especially among people with low English ability. We further discuss our methods' applicability to other languages and potential improvements and further research directions.

Dynamik: Syntactically-Driven Dynamic Font Sizing for Emphasis of Key Information

TL;DR

Dynamik tackles cognitive load in subtitle reading for non-native speakers by dynamically sizing keywords using a simple linguistic criterion that separates content words from function words. Implemented in a Unity-based real-time system with Azure Speech recognition and spaCy morphology/POS tagging, it supports three subtitle modes and is evaluated through crowd-sourced testing with 84 non-native English speakers across CNN news clips. Results indicate that Dynamik reduces mental workload and improves perceived comprehension among lower-English-proficiency participants, while producing no robust differences for native speakers. The approach offers practical potential to reduce subtitle display area and adapt to other languages, while inviting further work on alternative cues, refined keyword extraction, and latency optimization.

Abstract

In today's globalized world, there are increasing opportunities for individuals to communicate using a common non-native language (lingua franca). Non-native speakers often have opportunities to listen to foreign languages, but may not comprehend them as fully as native speakers do. To aid real-time comprehension, live transcription of subtitles is frequently used in everyday life (e.g., during Zoom conversations, watching YouTube videos, or on social networking sites). However, simultaneously reading subtitles while listening can increase cognitive load. In this study, we propose Dynamik, a system that reduces cognitive load during reading by decreasing the size of less important words and enlarging important ones, thereby enhancing sentence contrast. Our results indicate that Dynamik can reduce certain aspects of cognitive load, specifically, participants' perceived performance and effort among individuals with low proficiency in English, as well as enhance the users' sense of comprehension, especially among people with low English ability. We further discuss our methods' applicability to other languages and potential improvements and further research directions.

Paper Structure

This paper contains 37 sections, 1 equation, 10 figures, 15 tables.

Figures (10)

  • Figure 1: Three conditions in the experiment: Normal subtitle (left), Keyword subtitle, which reduces the font size of function words (center), Dynamik subtitle, which reduces the font size of function words (right).
  • Figure 2: System workflow. It begins with capturing English audio input using the PC's built--in microphone. The audio is then processed through speech recognition, followed by morphological analysis of the real-time speech recognition results. Based on this analysis, words that are not considered content words (nouns, adjectives, verbs, and auxiliary verbs) are reduced in size or omitted, depending on the subtitle type. The processed text is then displayed on a Unity-based interface.
  • Figure 3: Main part of the experiment workflow. i) Participants listened to an audio track and completed 10 TOEFL-adapted listening comprehension questions (Pre-test) to assess their English listening skills. ii) The participants then listened to six CNN news clips cnn with assistive subtitles (either one from Normal, Keyword, or Dynamik). After each excerpt, participants completed three questionnaires to ask their self-awareness of the extent of engagement with watching the clip, their self-awareness of the extent of comprehension of the clip, and the readability of the subtitle during the clip. After that, they also completed NASA--TLX assessments and listening comprehension quizzes (Comprehension Quiz) on the clip HART1988139. This step is repeated six times with the randomized order of the video clips (two clips for every condition).
  • Figure 4: Condition distribution for each video news clip.
  • Figure 5: Demographic distribution of the survey. Note that the color matches the language in graphs A) and B) but not in graphs C) and D). A) Distribution of native languages B) Distribution of other languages that the participants can speak. C) Native language distribution, the same as A), but we modified its color according to the language families. D) Distribution of nationality.
  • ...and 5 more figures