A Language-agnostic Model of Child Language Acquisition
Louis Mahon, Omri Abend, Uri Berger, Katherine Demuth, Mark Johnson, Mark Steedman
TL;DR
This work investigates whether a language-agnostic semantic bootstrapping model for child language acquisition can transfer from English to Hebrew. It reimplements Abend 2017's CC G-based framework, training on real CHILDES utterances paired with logical forms and using an EM-style algorithm with Dirichlet-process conditionals to jointly learn syntax and word meanings. Across English (Adam) and Hebrew (Hagar), the model achieves high word-meaning accuracy and learns a dominant SVO order, but Hebrew shows slower, less robust word-order and syntactic-category learning due to richer morphology. The findings highlight the value of cross-language evaluation for CLA models and point to morphology-aware extensions to improve multilingual acquisition capabilities.
Abstract
This work reimplements a recent semantic bootstrapping child-language acquisition model, which was originally designed for English, and trains it to learn a new language: Hebrew. The model learns from pairs of utterances and logical forms as meaning representations, and acquires both syntax and word meanings simultaneously. The results show that the model mostly transfers to Hebrew, but that a number of factors, including the richer morphology in Hebrew, makes the learning slower and less robust. This suggests that a clear direction for future work is to enable the model to leverage the similarities between different word forms.
