Table of Contents
Fetching ...

What Do Dialect Speakers Want? A Survey of Attitudes Towards Language Technology for German Dialects

Verena Blaschke, Christoph Purschke, Hinrich Schütze, Barbara Plank

TL;DR

The paper addresses the gap in understanding dialect communities' needs for language technologies in German dialects. It uses a cross-regional online survey (N=327 dialect speakers) to assess attitudes toward hypothetical dialect LT offerings, revealing a strong preference for processing dialect input—especially via speech—over output. Key contributions include detailed cross-subgroup analyses (activists vs. non-activists; regional differences) and correlations linking orthography attitudes to LT receptivity. The findings suggest NLP should prioritize spoken-language processing for dialects and that sociolinguistic factors modulate LT desirability, guiding more community-aligned LT development.

Abstract

Natural language processing (NLP) has largely focused on modelling standardized languages. More recently, attention has increasingly shifted to local, non-standardized languages and dialects. However, the relevant speaker populations' needs and wishes with respect to NLP tools are largely unknown. In this paper, we focus on dialects and regional languages related to German -- a group of varieties that is heterogeneous in terms of prestige and standardization. We survey speakers of these varieties (N=327) and present their opinions on hypothetical language technologies for their dialects. Although attitudes vary among subgroups of our respondents, we find that respondents are especially in favour of potential NLP tools that work with dialectal input (especially audio input) such as virtual assistants, and less so for applications that produce dialectal output such as machine translation or spellcheckers.

What Do Dialect Speakers Want? A Survey of Attitudes Towards Language Technology for German Dialects

TL;DR

The paper addresses the gap in understanding dialect communities' needs for language technologies in German dialects. It uses a cross-regional online survey (N=327 dialect speakers) to assess attitudes toward hypothetical dialect LT offerings, revealing a strong preference for processing dialect input—especially via speech—over output. Key contributions include detailed cross-subgroup analyses (activists vs. non-activists; regional differences) and correlations linking orthography attitudes to LT receptivity. The findings suggest NLP should prioritize spoken-language processing for dialects and that sociolinguistic factors modulate LT desirability, guiding more community-aligned LT development.

Abstract

Natural language processing (NLP) has largely focused on modelling standardized languages. More recently, attention has increasingly shifted to local, non-standardized languages and dialects. However, the relevant speaker populations' needs and wishes with respect to NLP tools are largely unknown. In this paper, we focus on dialects and regional languages related to German -- a group of varieties that is heterogeneous in terms of prestige and standardization. We survey speakers of these varieties (N=327) and present their opinions on hypothetical language technologies for their dialects. Although attitudes vary among subgroups of our respondents, we find that respondents are especially in favour of potential NLP tools that work with dialectal input (especially audio input) such as virtual assistants, and less so for applications that produce dialectal output such as machine translation or spellcheckers.
Paper Structure (23 sections, 3 figures, 3 tables)

This paper contains 23 sections, 3 figures, 3 tables.

Figures (3)

  • Figure 1: Countries and German states in which the respondents' dialects are spoken, with the number of respective respondents, and the overall age distribution.
  • Figure 2: Opinions on potential language technologies for dialects.STT=speech-to-text, TTS=text-to-speech, dial=dialect, deu=German, oth=other languages, MT=machine translation, cannot judge=skip question.
  • Figure 3: Spearman's $\rho$ between variables. Blue dots show positively correlated variables (max.: +0.77), red dots negatively correlated ones (min.: -0.50). We only include correlations with p-values under 0.05. The larger the dot, the smaller the p-value. The numbers behind the variables refer to the questions in Appendix §\ref{['sec:full-questionnaire']}. For further explanations of how the variables are coded, see Appendix §\ref{['sec:correlations']}.