Table of Contents
Fetching ...

Exploring Student Choice and the Use of Multimodal Generative AI in Programming Learning

Xinying Hou, Ruiwei Xiao, Runlong Ye, Michael Liut, John Stamper

TL;DR

This study investigates how undergraduate programming novices interact with multimodal generative AI tools during short programming tasks. Using 16 think-aloud sessions with Google AI Studio, the researchers examine which input and output modalities students choose (e.g., screen-sharing, speech, screenshots, audio transcripts) and the criteria guiding these choices, framed by cognitive load and multimedia learning theories. Key contributions include empirical insights into modality adoption patterns, the interaction dynamics resembling human tutoring, and design implications for enabling seamless modality switching and user control in CS education. The findings inform educators and tool developers on tailoring multimodal GenAI interfaces to support diverse learner preferences and task demands in programming learning.

Abstract

The broad adoption of Generative AI (GenAI) is impacting Computer Science education, and recent studies found its benefits and potential concerns when students use it for programming learning. However, most existing explorations focus on GenAI tools that primarily support text-to-text interaction. With recent developments, GenAI applications have begun supporting multiple modes of communication, known as multimodality. In this work, we explored how undergraduate programming novices choose and work with multimodal GenAI tools, and their criteria for choices. We selected a commercially available multimodal GenAI platform for interaction, as it supports multiple input and output modalities, including text, audio, image upload, and real-time screen-sharing. Through 16 think-aloud sessions that combined participant observation with follow-up semi-structured interviews, we investigated student modality choices for GenAI tools when completing programming problems and the underlying criteria for modality selections. With multimodal communication emerging as the future of AI in education, this work aims to spark continued exploration on understanding student interaction with multimodal GenAI in the context of CS education.

Exploring Student Choice and the Use of Multimodal Generative AI in Programming Learning

TL;DR

This study investigates how undergraduate programming novices interact with multimodal generative AI tools during short programming tasks. Using 16 think-aloud sessions with Google AI Studio, the researchers examine which input and output modalities students choose (e.g., screen-sharing, speech, screenshots, audio transcripts) and the criteria guiding these choices, framed by cognitive load and multimedia learning theories. Key contributions include empirical insights into modality adoption patterns, the interaction dynamics resembling human tutoring, and design implications for enabling seamless modality switching and user control in CS education. The findings inform educators and tool developers on tailoring multimodal GenAI interfaces to support diverse learner preferences and task demands in programming learning.

Abstract

The broad adoption of Generative AI (GenAI) is impacting Computer Science education, and recent studies found its benefits and potential concerns when students use it for programming learning. However, most existing explorations focus on GenAI tools that primarily support text-to-text interaction. With recent developments, GenAI applications have begun supporting multiple modes of communication, known as multimodality. In this work, we explored how undergraduate programming novices choose and work with multimodal GenAI tools, and their criteria for choices. We selected a commercially available multimodal GenAI platform for interaction, as it supports multiple input and output modalities, including text, audio, image upload, and real-time screen-sharing. Through 16 think-aloud sessions that combined participant observation with follow-up semi-structured interviews, we investigated student modality choices for GenAI tools when completing programming problems and the underlying criteria for modality selections. With multimodal communication emerging as the future of AI in education, this work aims to spark continued exploration on understanding student interaction with multimodal GenAI in the context of CS education.

Paper Structure

This paper contains 23 sections, 2 figures.

Figures (2)

  • Figure 1: Two practice problem types in the study
  • Figure 2: Main AI interaction modalities in this study. In Chat, students can use typed input or upload images and documents. AI processes input into text output. In Stream, the students can speak through a microphone, share screen, type, or upload files. AI then processes this information stream to produce a dynamic output with audio and a transcript.