TalkSketch: Multimodal Generative AI for Real-time Sketch Ideation with Speech
Weiyan Shi, Sunaya Upadhyay, Geraldine Quek, Kenny Tsu Wei Choo
TL;DR
TalkSketch addresses the challenge of translating evolving visual ideas into prompts during early-stage design by integrating freehand sketching with real-time speech into a unified multimodal AI chatbot. The system combines a Fabric.js sketch canvas, a Talking module for live transcripts, and a Gemini-based AI Insights chatbot that provides kickoff and refine prompts, enabling proactive, context-aware guidance. A formative study with six designers shows that traditional text prompts disrupt flow and that a speech-sketch modality can improve alignment and ideation flow. The work demonstrates the potential of conversational multimodal interfaces to support fluid design workflows and outlines future controlled evaluations and broader use cases such as education and live demos.
Abstract
Sketching is a widely used medium for generating and exploring early-stage design concepts. While generative AI (GenAI) chatbots are increasingly used for idea generation, designers often struggle to craft effective prompts and find it difficult to express evolving visual concepts through text alone. In the formative study (N=6), we examined how designers use GenAI during ideation, revealing that text-based prompting disrupts creative flow. To address these issues, we developed TalkSketch, an embedded multimodal AI sketching system that integrates freehand drawing with real-time speech input. TalkSketch aims to support a more fluid ideation process through capturing verbal descriptions during sketching and generating context-aware AI responses. Our work highlights the potential of GenAI tools to engage the design process itself rather than focusing on output.
