Table of Contents
Fetching ...

Interactive Sketchpad: A Multimodal Tutoring System for Collaborative, Visual Problem-Solving

Steven-Shine Chen, Jimin Lee, Paul Pu Liang

TL;DR

The paper addresses the limitations of text-only tutoring by introducing Interactive Sketchpad, a multimodal tutoring system that couples step-by-step textual hints with executable visual diagrams generated from Python code. Built atop a pre-trained large multimodal model and fine-tuned for tutoring, it supports natural multimodal interaction through a chatbot and an interactive whiteboard, enabling iterative, collaborative problem solving in geometry, calculus, and trigonometry. User studies with university students show improvements in task understanding, problem-solving accuracy, and engagement, supporting the efficacy of combining visuals with textual guidance. The work demonstrates how code-driven visualizations, when integrated with multimodal interaction, can transform educational technologies and support robust human-AI collaboration in learning.

Abstract

Humans have long relied on visual aids like sketches and diagrams to support reasoning and problem-solving. Visual tools, like auxiliary lines in geometry or graphs in calculus, are essential for understanding complex ideas. However, many tutoring systems remain text-based, providing feedback only through natural language. Leveraging recent advances in Large Multimodal Models (LMMs), this paper introduces Interactive Sketchpad, a tutoring system that combines language-based explanations with interactive visualizations to enhance learning. Built on a pre-trained LMM, Interactive Sketchpad is fine-tuned to provide step-by-step guidance in both text and visuals, enabling natural multimodal interaction with the student. Accurate and robust diagrams are generated by incorporating code execution into the reasoning process. User studies conducted on math problems such as geometry, calculus, and trigonometry demonstrate that Interactive Sketchpad leads to improved task comprehension, problem-solving accuracy, and engagement levels, highlighting its potential for transforming educational technologies. All code is available at: https://stevenshinechen.github.io/interactivesketchpad/.

Interactive Sketchpad: A Multimodal Tutoring System for Collaborative, Visual Problem-Solving

TL;DR

The paper addresses the limitations of text-only tutoring by introducing Interactive Sketchpad, a multimodal tutoring system that couples step-by-step textual hints with executable visual diagrams generated from Python code. Built atop a pre-trained large multimodal model and fine-tuned for tutoring, it supports natural multimodal interaction through a chatbot and an interactive whiteboard, enabling iterative, collaborative problem solving in geometry, calculus, and trigonometry. User studies with university students show improvements in task understanding, problem-solving accuracy, and engagement, supporting the efficacy of combining visuals with textual guidance. The work demonstrates how code-driven visualizations, when integrated with multimodal interaction, can transform educational technologies and support robust human-AI collaboration in learning.

Abstract

Humans have long relied on visual aids like sketches and diagrams to support reasoning and problem-solving. Visual tools, like auxiliary lines in geometry or graphs in calculus, are essential for understanding complex ideas. However, many tutoring systems remain text-based, providing feedback only through natural language. Leveraging recent advances in Large Multimodal Models (LMMs), this paper introduces Interactive Sketchpad, a tutoring system that combines language-based explanations with interactive visualizations to enhance learning. Built on a pre-trained LMM, Interactive Sketchpad is fine-tuned to provide step-by-step guidance in both text and visuals, enabling natural multimodal interaction with the student. Accurate and robust diagrams are generated by incorporating code execution into the reasoning process. User studies conducted on math problems such as geometry, calculus, and trigonometry demonstrate that Interactive Sketchpad leads to improved task comprehension, problem-solving accuracy, and engagement levels, highlighting its potential for transforming educational technologies. All code is available at: https://stevenshinechen.github.io/interactivesketchpad/.

Paper Structure

This paper contains 22 sections, 7 figures, 4 tables.

Figures (7)

  • Figure 1: Overview of Interactive Sketchpad: Given a multimodal question, Interactive Sketchpad generates a program to create a visual aid, then uses the visual aid as part of a hint to help the user solve the problem. The visual aid is sent to the interactive whiteboard which the user can write and draw on before sending the annotated diagram back to receive feedback or further help.
  • Figure 2: This flowchart demonstrates the interaction between user inputs, large multimodal models, and visualization components. The system processes user inputs, generates textual hints, interprets and executes code, and provides visualizations to guide problem-solving. The user can also annotate visualizations using the interactive whiteboard component.
  • Figure 3: Screenshot of the chatbot interface for Interactive Sketchpad. The user can view the generated visual hints and interact with Interactive Sketchpad by typing messages and uploading images.
  • Figure 4: Screenshot of the interactive whiteboard component of Interactive Sketchpad. The user can annotate on the visualization generated by Interactive Sketchpad which was done using an iPad and Apple Pencil during user studies.
  • Figure 5: To check the value of an integral, Interactive Sketchpad uses the trapezoidal rule to perform numerical integration by writing a Python program using scipy.integrate.
  • ...and 2 more figures