Automatic Annotation of Grammaticality in Child-Caregiver Conversations

Mitja Nikolaus; Abhishek Agrawal; Petros Kaklamanis; Alex Warstadt; Abdellah Fourtassi

Automatic Annotation of Grammaticality in Child-Caregiver Conversations

Mitja Nikolaus, Abhishek Agrawal, Petros Kaklamanis, Alex Warstadt, Abdellah Fourtassi

TL;DR

This work proposes a coding scheme for context-dependent grammaticality in child-caregiver conversations and annotates more than 4,000 utterances from a large corpus of transcribed conversations, and trains and evaluates a range of NLP models.

Abstract

The acquisition of grammar has been a central question to adjudicate between theories of language acquisition. In order to conduct faster, more reproducible, and larger-scale corpus studies on grammaticality in child-caregiver conversations, tools for automatic annotation can offer an effective alternative to tedious manual annotation. We propose a coding scheme for context-dependent grammaticality in child-caregiver conversations and annotate more than 4,000 utterances from a large corpus of transcribed conversations. Based on these annotations, we train and evaluate a range of NLP models. Our results show that fine-tuned Transformer-based models perform best, achieving human inter-annotation agreement levels.As a first application and sanity check of this tool, we use the trained models to annotate a corpus almost two orders of magnitude larger than the manually annotated data and verify that children's grammaticality shows a steady increase with age.This work contributes to the growing literature on applying state-of-the-art NLP methods to help study child language acquisition at scale.

Automatic Annotation of Grammaticality in Child-Caregiver Conversations

TL;DR

Abstract

Paper Structure (31 sections, 1 equation, 4 figures, 4 tables)

This paper contains 31 sections, 1 equation, 4 figures, 4 tables.

Introduction
Contributions of this work
Related Work
Automatic Annotation of Grammaticality
Automatic Annotation of Children's Grammaticality in Conversation
Manual Annotation
Annotation Scheme
Grammaticality of Children's Utterances in Conversation
Grammatical Error Categories
Data
Manual Annotation Results
Automatic Annotation
Models
Results
Analyses
...and 16 more sections

Figures (4)

Figure 1: Mean and standard deviation of validation set PCC scores of DeBERTa as a function of the number of preceding utterances in the context.
Figure 2: Effect of training data size on test set PCC scores of DeBERTa. The plot displays performance for models trained on 20%, 40%, 60%, 80%, and 100% of the training data.
Figure 3: Recall scores for ungrammatical utterances with different error types. Error bars indicate 95% confidence intervals estimated using bootstrapping. The dotted line indicates the overall average Recall.
Figure 4: Proportion of grammatical, ambiguous, and ungrammatical utterances for transcripts in English CHILDES of children aged 2 to 5 years. Additionally, we display fitted logistic regression curves.

Automatic Annotation of Grammaticality in Child-Caregiver Conversations

TL;DR

Abstract

Automatic Annotation of Grammaticality in Child-Caregiver Conversations

Authors

TL;DR

Abstract

Table of Contents

Figures (4)