Mathemyths: Leveraging Large Language Models to Teach Mathematical Language through Child-AI Co-Creative Storytelling

Chao Zhang; Xuechen Liu; Katherine Ziska; Soobin Jeon; Chi-Lin Yu; Ying Xu

Mathemyths: Leveraging Large Language Models to Teach Mathematical Language through Child-AI Co-Creative Storytelling

Chao Zhang, Xuechen Liu, Katherine Ziska, Soobin Jeon, Chi-Lin Yu, Ying Xu

TL;DR

This work presents Mathemyths, a GPT 4 based co creative storytelling system that teaches mathematical language to children aged 4 8 by embedding math terms like $sum$ $half$ $add$ $subtract$ $equal$ and $estimate$ into evolving narratives. It details an iterative prompt engineering pipeline to enable GPT 4 to generate questions, continue stories with in context math explanations, and scaffold hesitant learners, comparing AI versus human tutoring in a controlled study of 35 children. Results show learning gains in math language that are comparable between AI and human partners, with older children showing stronger improvements and AI delivering advantages in certain dimensions such as definition comprehension; engagement patterns differ by age and partner type. The paper discusses design implications for adaptive questioning multi modal creativity and embodied interaction while addressing challenges around prompt reliability hallucinations and safety highlighting paths for future work in child oriented AI storytelling systems.

Abstract

Mathematical language is a cornerstone of a child's mathematical development, and children can effectively acquire this language through storytelling with a knowledgeable and engaging partner. In this study, we leverage the recent advances in large language models to conduct free-form, creative conversations with children. Consequently, we developed Mathemyths, a joint storytelling agent that takes turns co-creating stories with children while integrating mathematical terms into the evolving narrative. This paper details our development process, illustrating how prompt-engineering can optimize LLMs for educational contexts. Through a user study involving 35 children aged 4-8 years, our results suggest that when children interacted with Mathemyths, their learning of mathematical language was comparable to those who co-created stories with a human partner. However, we observed differences in how children engaged with co-creation partners of different natures. Overall, we believe that LLM applications, like Mathemyths, offer children a unique conversational experience pertaining to focused learning objectives.

Mathemyths: Leveraging Large Language Models to Teach Mathematical Language through Child-AI Co-Creative Storytelling

TL;DR

This work presents Mathemyths, a GPT 4 based co creative storytelling system that teaches mathematical language to children aged 4 8 by embedding math terms like

and

into evolving narratives. It details an iterative prompt engineering pipeline to enable GPT 4 to generate questions, continue stories with in context math explanations, and scaffold hesitant learners, comparing AI versus human tutoring in a controlled study of 35 children. Results show learning gains in math language that are comparable between AI and human partners, with older children showing stronger improvements and AI delivering advantages in certain dimensions such as definition comprehension; engagement patterns differ by age and partner type. The paper discusses design implications for adaptive questioning multi modal creativity and embodied interaction while addressing challenges around prompt reliability hallucinations and safety highlighting paths for future work in child oriented AI storytelling systems.

Abstract

Paper Structure (56 sections, 9 figures, 3 tables)

This paper contains 56 sections, 9 figures, 3 tables.

Introduction
Related Work
Teaching Math Language through Storytelling
Conversational Interfaces for Children
Using Large Language Models for Child-facing Conversational Interfaces
The Development Process of Mathemyths
Design Principles
Prompt Engineering
Question Generation
Story Continuation
Scaffolding
Model Evaluation
Evaluating Prompt Engineered GPT-4's Performance in Question Generation
Evaluating Prompt Engineered GPT-4's Performance in Story Continuation
System Implementation
...and 41 more sections

Figures (9)

Figure 1: The interaction flow of the Mathemyths system.
Figure 2: Bar plots illustrating the distribution of data and the results from the ANOVA post-hoc Tukey's HSD test regarding the question generation evaluation. Statistically significant results are reported as $p < 0.05^{*}$, $p < 0.01^{**}$, $p < 0.001^{***}$. Error bars represent 95% confidence intervals (CIs).
Figure 3: The evaluation results on four metrics of story continuation.
Figure 4: The (a) plushy and (b) speaker used in our user study.
Figure 5: Box plots illustrating the data distribution and the results of a two-way repeated-measures mixed ANOVA for the pre-post-test, using condition and age group as covariates, in the mathematical language assessment. Statistically significant results are reported as $p < 0.05^{*}$, $p < 0.01^{**}$, $p < 0.001^{***}$.
...and 4 more figures

Mathemyths: Leveraging Large Language Models to Teach Mathematical Language through Child-AI Co-Creative Storytelling

TL;DR

Abstract

Mathemyths: Leveraging Large Language Models to Teach Mathematical Language through Child-AI Co-Creative Storytelling

Authors

TL;DR

Abstract

Table of Contents

Figures (9)