Table of Contents
Fetching ...

Detecting Gender Stereotypes in Scratch Programming Tutorials

Isabella Graßl, Benedikt Fein, Gordon Fraser

TL;DR

This paper tackles the persistence of gender stereotypes in Scratch programming tutorials by developing a dedicated framework to identify 'gender stereotype smells' across characters, content, instructions, and programming concepts. It builds an automated toolchain to evaluate 73 real tutorials and 16 LLM-generated projects, revealing that about one-fifth contain stereotype smells and that current LLMs struggle to detect them without structured guidance. While LLMs show potential to aid in generating more inclusive materials, their bias-detection performance is inconsistent, often producing nuanced stereotypes that are harder for educators to notice. The work offers actionable guidance for teachers to assess teaching content and highlights avenues for refining LLM-based generation and evaluation to foster more inclusive computing education.

Abstract

Gender stereotypes in introductory programming courses often go unnoticed, yet they can negatively influence young learners' interest and learning, particularly under-represented groups such as girls. Popular tutorials on block-based programming with Scratch may unintentionally reinforce biases through character choices, narrative framing, or activity types. Educators currently lack support in identifying and addressing such bias. With large language models~(LLMs) increasingly used to generate teaching materials, this problem is potentially exacerbated by LLMs trained on biased datasets. However, LLMs also offer an opportunity to address this issue. In this paper, we explore the use of LLMs for automatically identifying gender-stereotypical elements in Scratch tutorials, thus offering feedback on how to improve teaching content. We develop a framework for assessing gender bias considering characters, content, instructions, and programming concepts. Analogous to how code analysis tools provide feedback on code in terms of code smells, we operationalise this framework using an automated tool chain that identifies *gender stereotype smells*. Evaluation on 73 popular Scratch tutorials from leading educational platforms demonstrates that stereotype smells are common in practice. LLMs are not effective at detecting them, but our gender bias evaluation framework can guide LLMs in generating tutorials with fewer stereotype smells.

Detecting Gender Stereotypes in Scratch Programming Tutorials

TL;DR

This paper tackles the persistence of gender stereotypes in Scratch programming tutorials by developing a dedicated framework to identify 'gender stereotype smells' across characters, content, instructions, and programming concepts. It builds an automated toolchain to evaluate 73 real tutorials and 16 LLM-generated projects, revealing that about one-fifth contain stereotype smells and that current LLMs struggle to detect them without structured guidance. While LLMs show potential to aid in generating more inclusive materials, their bias-detection performance is inconsistent, often producing nuanced stereotypes that are harder for educators to notice. The work offers actionable guidance for teachers to assess teaching content and highlights avenues for refining LLM-based generation and evaluation to foster more inclusive computing education.

Abstract

Gender stereotypes in introductory programming courses often go unnoticed, yet they can negatively influence young learners' interest and learning, particularly under-represented groups such as girls. Popular tutorials on block-based programming with Scratch may unintentionally reinforce biases through character choices, narrative framing, or activity types. Educators currently lack support in identifying and addressing such bias. With large language models~(LLMs) increasingly used to generate teaching materials, this problem is potentially exacerbated by LLMs trained on biased datasets. However, LLMs also offer an opportunity to address this issue. In this paper, we explore the use of LLMs for automatically identifying gender-stereotypical elements in Scratch tutorials, thus offering feedback on how to improve teaching content. We develop a framework for assessing gender bias considering characters, content, instructions, and programming concepts. Analogous to how code analysis tools provide feedback on code in terms of code smells, we operationalise this framework using an automated tool chain that identifies *gender stereotype smells*. Evaluation on 73 popular Scratch tutorials from leading educational platforms demonstrates that stereotype smells are common in practice. LLMs are not effective at detecting them, but our gender bias evaluation framework can guide LLMs in generating tutorials with fewer stereotype smells.

Paper Structure

This paper contains 54 sections, 5 figures, 2 tables.

Figures (5)

  • Figure 1: Scratch programming tutorials with gender bias.
  • Figure 2: Prompt for RQ3. Placeholders in italics. To request inclusive projects, we include the statements from \ref{['tab:checklist-stereotypes']}.
  • Figure 3: Prompt for RQ2. Placeholders in italics.
  • Figure 4: Overall framework scores for existing and generated Scratch tutorials. Both prompt variants in RQ3 were evaluated by human raters.
  • Figure 5: Comparison between human (h) and llm (m) ratings for existing Scratch projects across framework categories.