Table of Contents
Fetching ...

Developmental Predictive Coding Model for Early Infancy Mono and Bilingual Vocal Continual Learning

Xiaodan Chen, Alexandre Pitti, Mathias Quoy, Nancy F Chen

TL;DR

The paper investigates how infants' imitation of speech during early development influences later bilingual language acquisition within the perceptual narrowing framework. It proposes a compact encoder–decoder with a self-organizing map and two generation modes: continual learning (predictive coding-based online learning) for early infancy imitation and compositional optimization (no-learning generation based on early constituents) for later imitation. Experiments with English as L1 and French/Chinese as L2 show that online continual learning yields better L2 imitation and preserves L1 performance, while L2 learning after a critical period is harder, aligning with perceptual narrowing. The approach emphasizes interpretability, online adaptability, and resilience to forgetting, with potential extensions to phonotactics and tonal languages.

Abstract

Understanding how infants perceive speech sounds and language structures is still an open problem. Previous research in artificial neural networks has mainly focused on large dataset-dependent generative models, aiming to replicate language-related phenomena such as ''perceptual narrowing''. In this paper, we propose a novel approach using a small-sized generative neural network equipped with a continual learning mechanism based on predictive coding for mono-and bilingual speech sound learning (referred to as language sound acquisition during ''critical period'') and a compositional optimization mechanism for generation where no learning is involved (later infancy sound imitation). Our model prioritizes interpretability and demonstrates the advantages of online learning: Unlike deep networks requiring substantial offline training, our model continuously updates with new data, making it adaptable and responsive to changing inputs. Through experiments, we demonstrate that if second language acquisition occurs during later infancy, the challenges associated with learning a foreign language after the critical period amplify, replicating the perceptual narrowing effect.

Developmental Predictive Coding Model for Early Infancy Mono and Bilingual Vocal Continual Learning

TL;DR

The paper investigates how infants' imitation of speech during early development influences later bilingual language acquisition within the perceptual narrowing framework. It proposes a compact encoder–decoder with a self-organizing map and two generation modes: continual learning (predictive coding-based online learning) for early infancy imitation and compositional optimization (no-learning generation based on early constituents) for later imitation. Experiments with English as L1 and French/Chinese as L2 show that online continual learning yields better L2 imitation and preserves L1 performance, while L2 learning after a critical period is harder, aligning with perceptual narrowing. The approach emphasizes interpretability, online adaptability, and resilience to forgetting, with potential extensions to phonotactics and tonal languages.

Abstract

Understanding how infants perceive speech sounds and language structures is still an open problem. Previous research in artificial neural networks has mainly focused on large dataset-dependent generative models, aiming to replicate language-related phenomena such as ''perceptual narrowing''. In this paper, we propose a novel approach using a small-sized generative neural network equipped with a continual learning mechanism based on predictive coding for mono-and bilingual speech sound learning (referred to as language sound acquisition during ''critical period'') and a compositional optimization mechanism for generation where no learning is involved (later infancy sound imitation). Our model prioritizes interpretability and demonstrates the advantages of online learning: Unlike deep networks requiring substantial offline training, our model continuously updates with new data, making it adaptable and responsive to changing inputs. Through experiments, we demonstrate that if second language acquisition occurs during later infancy, the challenges associated with learning a foreign language after the critical period amplify, replicating the perceptual narrowing effect.

Paper Structure

This paper contains 22 sections, 10 equations, 7 figures, 1 table.

Figures (7)

  • Figure 1: Our proposed encoder-decoder network architecture. The yellow arrow denotes unsupervised learning whereas the red one represents predictive coding FRISTON20031325Pitti2020.
  • Figure 2: Schema of the proposed modes. Left: Continual learning (CL) mode that simulates early infancy sound imitation, where the model generates sounds heard in a reinforced way. The SOM layer $Y$ serves as a predictor and the reconstructed output is adjusted based on sensory input. Continual learning is implemented by updating the SOM's internal representation (Equation \ref{["equation_y_x'"]}) based on new input data $X'$ over time, allowing the system to adapt its predictions to changes in the environment and improve prediction accuracy through continual learning. Such mode exhibits a simplified hierarchical structure within predictive coding, where $Z$ adapts similarity (Equation \ref{['equation_Z']}) to refine predictions. Right: proposed compositional optimization (CO) mode simulating later infancy sound generation, modelizing our hypothesis that the ability to imitate sounds in later childhood is mainly influenced by the minimal input received during early infancy. Later-learned sounds are generated based on the compositional nature of sounds acquired during critical developmental periods.
  • Figure 3: Self-organised map training results
  • Figure 4: The trend of error between input MFCC and reconstructed MFCC for both training and test datasets (Fig. \ref{['recon_error_train']} and Fig. \ref{['recon_error_test']} respectively). The model employs a mechanism to adjust randomly generated Gaussian-type inputs to approximate the pattern of the reproduced sound to that of the heard sound, resulting in decreased error over time, as illustrated in Fig. \ref{['pattern_error_train']}.
  • Figure 5: Critical period: Analysis of performance trends within the left bars of Fig. \ref{['modelComparaison_cn']} and Fig. \ref{['modelComparaison_fr']}, corresponding to models solely learning English, reveals that CL outperforms CO. This suggests that learning L2 after the critical period presents greater challenges in achieving comparable performance to learning during this critical period, aligning with the hypothesis of perceptual narrowing. Minimal input: Comparing results within the blue boxes, it's evident that errors decrease when L2 is learned, indicating that learning L2 further optimizes the model. A control group using L1 English as L2 was introduced (left box in each group with the horizontal axis labeled 'en+en'), where reconstruction error under CO mode disparities among L2 languages (Chinese $\approx$ French $>>$ English) indicate that the reduction in error is more likely due to dataset quality rather than quantity.
  • ...and 2 more figures