Table of Contents
Fetching ...

Building a Non-native Speech Corpus Featuring Chinese-English Bilingual Children: Compilation and Rationale

Hiuchung Hung, Andreas Maier, Thorsten Piske

TL;DR

This paper tackles the scarcity of non-native child speech resources for L2 English by introducing kidsNARRATE, a corpus of 50 Chinese-English bilingual children (ages 5–6) producing 6.5 hours of English narratives, with transcripts, word-level grammatical and pronunciation annotations, and accompanying videos. It also presents a remote data-collection workflow using accessible tools (ZOOM, OBS) and a multi-rater scoring system to ensure data quality. The dataset combines parallel L1 Chinese MAIN references, noise-controlled audio, and comprehensive annotations to support ASR development and L2 pedagogy, while enabling future analyses of language transfer and emotion cues from video. The work offers a replicable template for remote data collection in pediatric linguistics and highlights applications in automated language assessment and teacher-informed language modeling.

Abstract

This paper introduces a non-native speech corpus consisting of narratives from fifty 5- to 6-year-old Chinese-English children. Transcripts totaling 6.5 hours of children taking a narrative comprehension test in English (L2) are presented, along with human-rated scores and annotations of grammatical and pronunciation errors. The children also completed the parallel MAIN tests in Chinese (L1) for reference purposes. For all tests we recorded audio and video with our innovative self-developed remote collection methods. The video recordings serve to mitigate the challenge of low intelligibility in L2 narratives produced by young children during the transcription process. This corpus offers valuable resources for second language teaching and has the potential to enhance the overall performance of automatic speech recognition (ASR).

Building a Non-native Speech Corpus Featuring Chinese-English Bilingual Children: Compilation and Rationale

TL;DR

This paper tackles the scarcity of non-native child speech resources for L2 English by introducing kidsNARRATE, a corpus of 50 Chinese-English bilingual children (ages 5–6) producing 6.5 hours of English narratives, with transcripts, word-level grammatical and pronunciation annotations, and accompanying videos. It also presents a remote data-collection workflow using accessible tools (ZOOM, OBS) and a multi-rater scoring system to ensure data quality. The dataset combines parallel L1 Chinese MAIN references, noise-controlled audio, and comprehensive annotations to support ASR development and L2 pedagogy, while enabling future analyses of language transfer and emotion cues from video. The work offers a replicable template for remote data collection in pediatric linguistics and highlights applications in automated language assessment and teacher-informed language modeling.

Abstract

This paper introduces a non-native speech corpus consisting of narratives from fifty 5- to 6-year-old Chinese-English children. Transcripts totaling 6.5 hours of children taking a narrative comprehension test in English (L2) are presented, along with human-rated scores and annotations of grammatical and pronunciation errors. The children also completed the parallel MAIN tests in Chinese (L1) for reference purposes. For all tests we recorded audio and video with our innovative self-developed remote collection methods. The video recordings serve to mitigate the challenge of low intelligibility in L2 narratives produced by young children during the transcription process. This corpus offers valuable resources for second language teaching and has the potential to enhance the overall performance of automatic speech recognition (ASR).
Paper Structure (19 sections, 4 figures, 2 tables)

This paper contains 19 sections, 4 figures, 2 tables.

Figures (4)

  • Figure 1: Example of teacher camera perspective
  • Figure 2: Example of child camera perspective
  • Figure 3: Recording positions
  • Figure 4: Example waveform view of the child and teacher audio channels that contain crosstalk: both microphones picked up both speech signals. Shown in blue is the wanted speech and in red the unwanted crosstalk.