Table of Contents
Fetching ...

Word Ladders: A Mobile Application for Semantic Data Collection

Marianna Marcella Bolognesi, Claudia Collacciani, Andrea Ferrari, Francesca Genovese, Tommaso Lamarra, Adele Loia, Giulia Rambelli, Andrea Amelio Ravelli, Caterina Villani

TL;DR

Word Ladders presents a gamified mobile platform for collecting hierarchical semantic data via IS-A word ladders in English and Italian, targeting both linguistic resources and cognitive research. The system uses a React Native frontend, a NodeJS/MongoDB backend, and AWS hosting to gather anonymized sociolinguistic data and to construct specificity metrics and a hierarchical taxonomy; ladder quality is scored against MultiWordNet with a formula that balances validated and novel entries, plus a time-based bonus. The paper details the game rules, data architecture, and preliminary analyses (roughly 30k games from ~3k users in six months), and demonstrates educational deployment in Italian schools alongside plans to scale English data and compare human vs. LLM categorizations. Overall, Word Ladders offers a scalable approach to generating cross-language lexical resources and probing cognition and readability, with practical impact for NLP tasks and educational vocabulary training.

Abstract

Word Ladders is a free mobile application for Android and iOS, developed for collecting linguistic data, specifically lists of words related to each other through semantic relations of categorical inclusion, within the Abstraction project (ERC-2021-STG-101039777). We hereby provide an overview of Word Ladders, explaining its game logic, motivation and expected results and applications to nlp tasks as well as to the investigation of cognitive scientific open questions

Word Ladders: A Mobile Application for Semantic Data Collection

TL;DR

Word Ladders presents a gamified mobile platform for collecting hierarchical semantic data via IS-A word ladders in English and Italian, targeting both linguistic resources and cognitive research. The system uses a React Native frontend, a NodeJS/MongoDB backend, and AWS hosting to gather anonymized sociolinguistic data and to construct specificity metrics and a hierarchical taxonomy; ladder quality is scored against MultiWordNet with a formula that balances validated and novel entries, plus a time-based bonus. The paper details the game rules, data architecture, and preliminary analyses (roughly 30k games from ~3k users in six months), and demonstrates educational deployment in Italian schools alongside plans to scale English data and compare human vs. LLM categorizations. Overall, Word Ladders offers a scalable approach to generating cross-language lexical resources and probing cognition and readability, with practical impact for NLP tasks and educational vocabulary training.

Abstract

Word Ladders is a free mobile application for Android and iOS, developed for collecting linguistic data, specifically lists of words related to each other through semantic relations of categorical inclusion, within the Abstraction project (ERC-2021-STG-101039777). We hereby provide an overview of Word Ladders, explaining its game logic, motivation and expected results and applications to nlp tasks as well as to the investigation of cognitive scientific open questions
Paper Structure (15 sections, 2 equations, 3 figures, 2 tables)

This paper contains 15 sections, 2 equations, 3 figures, 2 tables.

Figures (3)

  • Figure 1: Schematic representation of the workflow of data collection and processing. (a) Data are collected through Word Ladders, anonymized and stored into a AWS server. (b) Stored data are accessed using Postman API and converted to generate a graph. (c) The resulting graph is post-processed to detect typos and remove noisy ladders. (d) The final graph is used to i.) understand the semantic organization of users (of different sociodemographic backgrounds) and ii.) compute the Specificity rating for the given words.
  • Figure 2: Education (a) and profession (b) information about Word Ladders users.
  • Figure 3: Cumulative counts for number of users (a) and number of played games (b).