Table of Contents
Fetching ...

SALAD: Smart AI Language Assistant Daily

Ragib Amin Nihal, Tran Dong Huu Quoc, Lin Zirui, Xu Yimimg, Liu Haoran, An Zhaoyi, Kyou Ma

TL;DR

The paper addresses foreigners' difficulties in learning Japanese and the inadequacy of conventional translators for language acquisition. It proposes SALAD, an AI-driven platform that integrates Kanji-Kana-Romaji translations, speech recognition, grammar explanations, vocabulary tracking, and lyrics-based song generation, leveraging tools like Whisper, gTTS, ChatGPT, and DiffSinger. The system architecture combines Translation, Vocabulary, Lyrics, and Song modules with dual UI implementations (Gradio web and PySide6 desktop) and a centralized progress database. Survey results indicate substantial perceived usefulness and potential to improve conversational fluency, while limitations include language pair scope and API dependency, suggesting directions for future extension.

Abstract

SALAD is an AI-driven language-learning application designed to help foreigners learn Japanese. It offers translations in Kanji-Kana-Romaji, speech recognition, translated audio, vocabulary tracking, grammar explanations, and songs generated from newly learned words. The app targets beginners and intermediate learners, aiming to make language acquisition more accessible and enjoyable. SALAD uses daily translations to enhance fluency and comfort in communication with native speakers. The primary objectives include effective Japanese language learning, user engagement, and progress tracking. A survey by us found that 39% of foreigners in Japan face discomfort in conversations with Japanese speakers. Over 60% of foreigners expressed confidence in SALAD's ability to enhance their Japanese language skills. The app uses large language models, speech recognition, and diffusion models to bridge the language gap and foster a more inclusive community in Japan.

SALAD: Smart AI Language Assistant Daily

TL;DR

The paper addresses foreigners' difficulties in learning Japanese and the inadequacy of conventional translators for language acquisition. It proposes SALAD, an AI-driven platform that integrates Kanji-Kana-Romaji translations, speech recognition, grammar explanations, vocabulary tracking, and lyrics-based song generation, leveraging tools like Whisper, gTTS, ChatGPT, and DiffSinger. The system architecture combines Translation, Vocabulary, Lyrics, and Song modules with dual UI implementations (Gradio web and PySide6 desktop) and a centralized progress database. Survey results indicate substantial perceived usefulness and potential to improve conversational fluency, while limitations include language pair scope and API dependency, suggesting directions for future extension.

Abstract

SALAD is an AI-driven language-learning application designed to help foreigners learn Japanese. It offers translations in Kanji-Kana-Romaji, speech recognition, translated audio, vocabulary tracking, grammar explanations, and songs generated from newly learned words. The app targets beginners and intermediate learners, aiming to make language acquisition more accessible and enjoyable. SALAD uses daily translations to enhance fluency and comfort in communication with native speakers. The primary objectives include effective Japanese language learning, user engagement, and progress tracking. A survey by us found that 39% of foreigners in Japan face discomfort in conversations with Japanese speakers. Over 60% of foreigners expressed confidence in SALAD's ability to enhance their Japanese language skills. The app uses large language models, speech recognition, and diffusion models to bridge the language gap and foster a more inclusive community in Japan.
Paper Structure (18 sections, 6 figures, 1 table, 1 algorithm)

This paper contains 18 sections, 6 figures, 1 table, 1 algorithm.

Figures (6)

  • Figure 1: Some of the survey results
  • Figure 2: System Architecture: The SALAD system architecture seamlessly integrates four key modules—Translation, Vocabulary, Lyrics, and Song—into a cohesive language learning and musical creation platform. The workflow begins with user input, either spoken or written in English, which is transcribed into text by advanced speech recognition technologies like Whisper or Google ASR. The text is then translated into Japanese using ChatGPT's translation services. ChatGPT also aids in generating vocabulary and grammar analyses, while Google TTS technology converts text into speech for auditory learning. Progress tracking is meticulously managed, with local session data providing immediate feedback and a central database documenting long-term learning. The system's creative aspect emerges in the Lyrics and Song modules, where it crafts music by integrating learning words into pre-existing melodies to produce complete songs. This user-centric design supports both immediate and extended engagement, offering a transformative approach to language and music learning.
  • Figure 3: Example of Module 1 and 2
  • Figure 4: System Integration in User Interface
  • Figure 5: System Implementation in UI
  • ...and 1 more figures