Dia-Lingle: A Gamified Interface for Dialectal Data Collection
Jiugeng Sun, Rita Sevastjanova, Sina Ahmadi, Rico Sennrich, Mennatallah El-Assady
TL;DR
The paper tackles the scarcity of dialectal data in NLP by introducing Dia-Lingle, a gamified interface that combines two data-collection components (Quiz and Match) with an uncertainty-driven active-learning loop to expand dialect corpora. It documents a dialect classifier, a hexagon-based geographic visualization, and a multi-layer interface designed to sustain user engagement through progressive difficulty. The key contributions are (i) a gamified data-collection approach for dialectal resources, (ii) integration of active learning to guide sentence selection, and (iii) a visualization-centric method for representing dialect-region coverage, validated by usability studies showing high user satisfaction. By enabling community participation and providing a scalable, interpretable data-collection workflow, Dia-Lingle supports more inclusive and dialect-aware NLP technologies and future dialect-specific language modeling.
Abstract
Dialects suffer from the scarcity of computational textual resources as they exist predominantly in spoken rather than written form and exhibit remarkable geographical diversity. Collecting dialect data and subsequently integrating it into current language technologies present significant obstacles. Gamification has been proven to facilitate remote data collection processes with great ease and on a substantially wider scale. This paper introduces Dia-Lingle, a gamified interface aimed to improve and facilitate dialectal data collection tasks such as corpus expansion and dialect labelling. The platform features two key components: the first challenges users to rewrite sentences in their dialects, identifies them through a classifier and solicits feedback, and the other one asks users to match sentences to their geographical locations. Dia-Lingle combines active learning with gamified difficulty levels, strategically encouraging prolonged user engagement while efficiently enriching the dialect corpus. Usability evaluation shows that our interface demonstrates high levels of user satisfaction. We provide the link to Dia-Lingle: https://dia-lingle.ivia.ch/, and demo video: https://youtu.be/0QyJsB8ym64.
