Table of Contents
Fetching ...

A Knowledge-Based Language Model: Deducing Grammatical Knowledge in a Multi-Agent Language Acquisition Simulation

David Ph. Shakouri, Crit Cremers, Niels O. Schiller

TL;DR

This work presents MODOMA, a two-agent laboratory for unsupervised first-language-like acquisition where a daughter agent learns functional and content word categories from an adult Dutch grammar model, Delilah, using explicit graph-based representations and unification. By leveraging Zipfian distributional patterns in machine-generated input, the study demonstrates that the daughter can acquire discrete grammatical categories and use them to generate and parse utterances, with strong evidence of alignment to the mother’s grammar ($p<0.001$) when sufficient data (10k exemplars) are used. The approach highlights the value of explicit knowledge representations and interactive, interpretable language learning, and outlines parameter settings, validation on new data, and avenues for expanding to more complex grammatical phenomena. This work thus establishes a foundation for interpretable, multi-agent simulations of language acquisition and points to future work on feedback, broader linguistic structures, and scalability.

Abstract

This paper presents an initial study performed by the MODOMA system. The MODOMA is a computational multi-agent laboratory environment for unsupervised language acquisition experiments such that acquisition is based on the interaction between two language models, an adult and a child agent. Although this framework employs statistical as well as rule-based procedures, the result of language acquisition is a knowledge-based language model, which can be used to generate and parse new utterances of the target language. This system is fully parametrized and researchers can control all aspects of the experiments while the results of language acquisition, that is, the acquired grammatical knowledge, are explicitly represented and can be consulted. Thus, this system introduces novel possibilities for conducting computational language acquisition experiments. The experiments presented by this paper demonstrate that functional and content categories can be acquired and represented by the daughter agent based on training and test data containing different amounts of exemplars generated by the adult agent. Interestingly, similar patterns, which are well-established for human-generated data, are also found for these machine-generated data. As the procedures resulted in the successful acquisition of discrete grammatical categories by the child agent, these experiments substantiate the validity of the MODOMA approach to modelling language acquisition.

A Knowledge-Based Language Model: Deducing Grammatical Knowledge in a Multi-Agent Language Acquisition Simulation

TL;DR

This work presents MODOMA, a two-agent laboratory for unsupervised first-language-like acquisition where a daughter agent learns functional and content word categories from an adult Dutch grammar model, Delilah, using explicit graph-based representations and unification. By leveraging Zipfian distributional patterns in machine-generated input, the study demonstrates that the daughter can acquire discrete grammatical categories and use them to generate and parse utterances, with strong evidence of alignment to the mother’s grammar () when sufficient data (10k exemplars) are used. The approach highlights the value of explicit knowledge representations and interactive, interpretable language learning, and outlines parameter settings, validation on new data, and avenues for expanding to more complex grammatical phenomena. This work thus establishes a foundation for interpretable, multi-agent simulations of language acquisition and points to future work on feedback, broader linguistic structures, and scalability.

Abstract

This paper presents an initial study performed by the MODOMA system. The MODOMA is a computational multi-agent laboratory environment for unsupervised language acquisition experiments such that acquisition is based on the interaction between two language models, an adult and a child agent. Although this framework employs statistical as well as rule-based procedures, the result of language acquisition is a knowledge-based language model, which can be used to generate and parse new utterances of the target language. This system is fully parametrized and researchers can control all aspects of the experiments while the results of language acquisition, that is, the acquired grammatical knowledge, are explicitly represented and can be consulted. Thus, this system introduces novel possibilities for conducting computational language acquisition experiments. The experiments presented by this paper demonstrate that functional and content categories can be acquired and represented by the daughter agent based on training and test data containing different amounts of exemplars generated by the adult agent. Interestingly, similar patterns, which are well-established for human-generated data, are also found for these machine-generated data. As the procedures resulted in the successful acquisition of discrete grammatical categories by the child agent, these experiments substantiate the validity of the MODOMA approach to modelling language acquisition.

Paper Structure

This paper contains 10 sections, 7 figures, 11 tables.

Figures (7)

  • Figure 1: Example of the representation of the lexical item for de ('the.masc/fem') by Delilah
  • Figure 2: Example of the representation of an acquired lexical item for auto (‘car’) by the daughter agent
  • Figure 3: Representation of categories in the MODOMA versus conventional grammatical terminology
  • Figure 4: Boxplots depicting function and content words for the 10,000 NPs training data
  • Figure 5: Boxplots depicting function and content words for the 10,000 sentences training data
  • ...and 2 more figures