Table of Contents
Fetching ...

GenAI Voice Mode in Programming Education

Sven Jacobs, Natalie Kiesler

TL;DR

The paper investigates real-time GenAI voice mode in programming education by analyzing audio dialogues from nine ninth-grade students using a voice-enabled tutor to learn Python. It uses qualitative transcription, coding of prompts and responses, and a Partner Modelling Questionnaire to characterize interaction patterns, feedback quality, and student perceptions. Findings reveal that the tutor predominantly offers debugging guidance, but 28.6% of feedback is incorrect and code pronunciation issues undermine reliability, raising accessibility concerns. The work highlights the need to improve fluent code-speech generation and context-aware, reliable feedback before deploying such tools to diverse learners in classroom settings.

Abstract

Real-time voice interfaces using multimodal Generative AI (GenAI) can potentially address the accessibility needs of novice programmers with disabilities (e.g., related to vision). Yet, little is known about how novices interact with GenAI tools and their feedback quality in the form of audio output. This paper analyzes audio dialogues from nine 9th-grade students using a voice-enabled tutor (powered by OpenAI's Realtime API) in an authentic classroom setting while learning Python. We examined the students' voice prompts and AI's responses (1210 messages) by using qualitative coding. We also gathered students' perceptions via the Partner Modeling Questionnaire. The GenAI Voice Tutor primarily offered feedback on mistakes and next steps, but its correctness was limited (71.4% correct out of 416 feedback outputs). Quality issues were observed, particularly when the AI attempted to utter programming code elements. Students used the GenAI voice tutor primarily for debugging. They perceived it as competent, only somewhat human-like, and flexible. The present study is the first to explore the interaction dynamics of real-time voice GenAI tutors and novice programmers, informing future educational tool design and potentially addressing accessibility needs of diverse learners.

GenAI Voice Mode in Programming Education

TL;DR

The paper investigates real-time GenAI voice mode in programming education by analyzing audio dialogues from nine ninth-grade students using a voice-enabled tutor to learn Python. It uses qualitative transcription, coding of prompts and responses, and a Partner Modelling Questionnaire to characterize interaction patterns, feedback quality, and student perceptions. Findings reveal that the tutor predominantly offers debugging guidance, but 28.6% of feedback is incorrect and code pronunciation issues undermine reliability, raising accessibility concerns. The work highlights the need to improve fluent code-speech generation and context-aware, reliable feedback before deploying such tools to diverse learners in classroom settings.

Abstract

Real-time voice interfaces using multimodal Generative AI (GenAI) can potentially address the accessibility needs of novice programmers with disabilities (e.g., related to vision). Yet, little is known about how novices interact with GenAI tools and their feedback quality in the form of audio output. This paper analyzes audio dialogues from nine 9th-grade students using a voice-enabled tutor (powered by OpenAI's Realtime API) in an authentic classroom setting while learning Python. We examined the students' voice prompts and AI's responses (1210 messages) by using qualitative coding. We also gathered students' perceptions via the Partner Modeling Questionnaire. The GenAI Voice Tutor primarily offered feedback on mistakes and next steps, but its correctness was limited (71.4% correct out of 416 feedback outputs). Quality issues were observed, particularly when the AI attempted to utter programming code elements. Students used the GenAI voice tutor primarily for debugging. They perceived it as competent, only somewhat human-like, and flexible. The present study is the first to explore the interaction dynamics of real-time voice GenAI tutors and novice programmers, informing future educational tool design and potentially addressing accessibility needs of diverse learners.

Paper Structure

This paper contains 20 sections, 1 figure, 4 tables.

Figures (1)

  • Figure 1: Tutor Kai User Interface