Table of Contents
Fetching ...

Can LLM Agents Identify Spoken Dialects like a Linguist?

Tobias Bystrich, Lukas Hamm, Maria Hassan, Lea Fischbach, Lucie Flek, Akbar Karimi

Abstract

Due to the scarcity of labeled dialectal speech, audio dialect classification is a challenging task for most languages, including Swiss German. In this work, we explore the ability of large language models (LLMs) as agents in understanding the dialects and whether they can show comparable performance to models such as HuBERT in dialect classification. In addition, we provide an LLM baseline and a human linguist one. Our approach uses phonetic transcriptions produced by ASR systems and combines them with linguistic resources such as dialect feature maps, vowel history, and rules. Our findings indicate that, when linguistic information is provided, the LLM predictions improve. The human baseline shows that automatically generated transcriptions can be beneficial for such classifications, but also presents opportunities for improvement.

Can LLM Agents Identify Spoken Dialects like a Linguist?

Abstract

Due to the scarcity of labeled dialectal speech, audio dialect classification is a challenging task for most languages, including Swiss German. In this work, we explore the ability of large language models (LLMs) as agents in understanding the dialects and whether they can show comparable performance to models such as HuBERT in dialect classification. In addition, we provide an LLM baseline and a human linguist one. Our approach uses phonetic transcriptions produced by ASR systems and combines them with linguistic resources such as dialect feature maps, vowel history, and rules. Our findings indicate that, when linguistic information is provided, the LLM predictions improve. The human baseline shows that automatically generated transcriptions can be beneficial for such classifications, but also presents opportunities for improvement.

Paper Structure

This paper contains 27 sections, 5 figures, 6 tables.

Figures (5)

  • Figure 1: Graphic for SwissDial classes and Innerschweiz made with REDE-SprachGIS and dialect region background HSK1_2.
  • Figure 2: The agentic framework for dialect analysis
  • Figure 3: Sample transcription with Highest Alemannic features. Orthographic transcription: Wir haben Yvonne Beutler an ihrem letzten Arbeitstag besucht und auf ihre politische Laufbahn zurückgeschaut [-gelugt]
  • Figure 4: HuBERT baseline architecture
  • Figure 5: Alignments between human baseline and GPT models