Building babyGPTs: Youth Engaging in Data Practices and Ethical Considerations through the Construction of Generative Language Models
Luis Morales-Navarro, Daniel J. Noh, Yasmin B. Kafai
TL;DR
This study explores whether youth can plausibly design generative language models by documenting a case study in which three teenagers build a babyGPT using a small-scale dataset derived from Marvel scripts. It analyzes how they engage in AI/ML data practices—data collection, quality control, preparation, and evaluation—and confront ethical issues such as copyright and attribution. The findings indicate that youth can actively participate in the construction and critique of GLMs, suggesting that construction-based approaches can enhance AI/ML literacies and warrant further development of accessible tools and curricula. The work contributes a feasible pathway for youth-led GLM design and outlines directions for deeper, iterative design experiences in K-12 settings.
Abstract
As generative language models (GLMs) have gained popularity, youth are increasingly using them in their everyday lives. As such, most research has centered on supporting youth as users of GLM-powered systems. However, we know little of how to engage youth in the design of these models. Building on the rich legacy of child-computer interaction research that positions youth as designers of computing systems, we explore how to support young people in designing GLMs. Through a case study of three teenagers (ages 14-15) building a babyGPT screenplay generator, we illustrate how the team developed a model while engaging in artificial intelligence/machine learning-relevant data practices and addressing ethical issues. This paper contributes a case study that demonstrates the feasibility of engaging youth in building GLMs.
