Table of Contents
Fetching ...

Discovering Properties of Inflectional Morphology in Neural Emergent Communication

Miles Gilberti, Shane Storks, Huteng Dai

TL;DR

This work reframes emergent communication as emergent morphology by imposing a small fixed alphabet and an inflection-like Attr-Val construction with a high-cardinality root attribute and smaller grammatical attributes, enforcing double articulation via $|C|=8$ and $m=9$. It introduces metrics for concatenativity (e.g., HASLen,BPELen) and fusion (F-TopSim) alongside segmentation methods (HAS, BPE) and applies them to both artificial languages and natural-language inflection data (Spanish/Arabic). Key findings show that phonology-inspired articulation pressure fosters concatenative morphology, while emergent languages tend to fuse grammatical features akin to natural languages, though topographic similarity to human languages remains limited. The framework enables direct, linguistically grounded comparisons between emergent and natural morphologies and points to morphophonological pressures as drivers of morphology in Emergent Communication. Together, the results offer a principled, adjustable platform to study morphology emergence under small-vocabulary constraints and inflection-like structure.

Abstract

Emergent communication (EmCom) with deep neural network-based agents promises to yield insights into the nature of human language, but remains focused primarily on a few subfield-specific goals and metrics that prioritize communication schemes which represent attributes with unique characters one-to-one and compose them syntactically. We thus reinterpret a common EmCom setting, the attribute-value reconstruction game, by imposing a small-vocabulary constraint to simulate double articulation, and formulating a novel setting analogous to naturalistic inflectional morphology (enabling meaningful comparison to natural language communication schemes). We develop new metrics and explore variations of this game motivated by real properties of inflectional morphology: concatenativity and fusion. Through our experiments, we discover that simulated phonological constraints encourage concatenative morphology, and emergent languages replicate the tendency of natural languages to fuse grammatical attributes.

Discovering Properties of Inflectional Morphology in Neural Emergent Communication

TL;DR

This work reframes emergent communication as emergent morphology by imposing a small fixed alphabet and an inflection-like Attr-Val construction with a high-cardinality root attribute and smaller grammatical attributes, enforcing double articulation via and . It introduces metrics for concatenativity (e.g., HASLen,BPELen) and fusion (F-TopSim) alongside segmentation methods (HAS, BPE) and applies them to both artificial languages and natural-language inflection data (Spanish/Arabic). Key findings show that phonology-inspired articulation pressure fosters concatenative morphology, while emergent languages tend to fuse grammatical features akin to natural languages, though topographic similarity to human languages remains limited. The framework enables direct, linguistically grounded comparisons between emergent and natural morphologies and points to morphophonological pressures as drivers of morphology in Emergent Communication. Together, the results offer a principled, adjustable platform to study morphology emergence under small-vocabulary constraints and inflection-like structure.

Abstract

Emergent communication (EmCom) with deep neural network-based agents promises to yield insights into the nature of human language, but remains focused primarily on a few subfield-specific goals and metrics that prioritize communication schemes which represent attributes with unique characters one-to-one and compose them syntactically. We thus reinterpret a common EmCom setting, the attribute-value reconstruction game, by imposing a small-vocabulary constraint to simulate double articulation, and formulating a novel setting analogous to naturalistic inflectional morphology (enabling meaningful comparison to natural language communication schemes). We develop new metrics and explore variations of this game motivated by real properties of inflectional morphology: concatenativity and fusion. Through our experiments, we discover that simulated phonological constraints encourage concatenative morphology, and emergent languages replicate the tendency of natural languages to fuse grammatical attributes.

Paper Structure

This paper contains 43 sections, 3 equations, 11 figures, 7 tables.

Figures (11)

  • Figure 1: We reframe the typical attribute-value (Attr-Val) reconstruction game with an analogy to inflectional morphology, where roots $\ell$ and slots $\sigma$ (e.g., tense and person) comprise attributes which must be communicated through messages. Emergent communication schemes are then comparable to natural language inflections.
  • Figure 2: Artificial languages. Each language defines a unique symbol for each possible attribute value, but varies the composition operation for symbols.
  • Figure 3: BPELen at varying $|V|$ for emergent and natural languages in the 42 $\times$ 3 inflection Attr-Val setting. Comprehensive plots for the full range of inflection experiments with natural language comparisons in Appendix \ref{['apx:more nat langs infl']}.
  • Figure 4: BPELen at varying $|V|$ for emergent languages in the default Attr-Val setting.
  • Figure 5: Topographic similarities of 42 $\times$ 2 $\times$ 3 emergent (without and with articulation pressure) and natural languages. See Appendix \ref{['apx:more nat langs infl']} for additional conditions.
  • ...and 6 more figures