Table of Contents
Fetching ...

TalkSketch: Multimodal Generative AI for Real-time Sketch Ideation with Speech

Weiyan Shi, Sunaya Upadhyay, Geraldine Quek, Kenny Tsu Wei Choo

TL;DR

TalkSketch addresses the challenge of translating evolving visual ideas into prompts during early-stage design by integrating freehand sketching with real-time speech into a unified multimodal AI chatbot. The system combines a Fabric.js sketch canvas, a Talking module for live transcripts, and a Gemini-based AI Insights chatbot that provides kickoff and refine prompts, enabling proactive, context-aware guidance. A formative study with six designers shows that traditional text prompts disrupt flow and that a speech-sketch modality can improve alignment and ideation flow. The work demonstrates the potential of conversational multimodal interfaces to support fluid design workflows and outlines future controlled evaluations and broader use cases such as education and live demos.

Abstract

Sketching is a widely used medium for generating and exploring early-stage design concepts. While generative AI (GenAI) chatbots are increasingly used for idea generation, designers often struggle to craft effective prompts and find it difficult to express evolving visual concepts through text alone. In the formative study (N=6), we examined how designers use GenAI during ideation, revealing that text-based prompting disrupts creative flow. To address these issues, we developed TalkSketch, an embedded multimodal AI sketching system that integrates freehand drawing with real-time speech input. TalkSketch aims to support a more fluid ideation process through capturing verbal descriptions during sketching and generating context-aware AI responses. Our work highlights the potential of GenAI tools to engage the design process itself rather than focusing on output.

TalkSketch: Multimodal Generative AI for Real-time Sketch Ideation with Speech

TL;DR

TalkSketch addresses the challenge of translating evolving visual ideas into prompts during early-stage design by integrating freehand sketching with real-time speech into a unified multimodal AI chatbot. The system combines a Fabric.js sketch canvas, a Talking module for live transcripts, and a Gemini-based AI Insights chatbot that provides kickoff and refine prompts, enabling proactive, context-aware guidance. A formative study with six designers shows that traditional text prompts disrupt flow and that a speech-sketch modality can improve alignment and ideation flow. The work demonstrates the potential of conversational multimodal interfaces to support fluid design workflows and outlines future controlled evaluations and broader use cases such as education and live demos.

Abstract

Sketching is a widely used medium for generating and exploring early-stage design concepts. While generative AI (GenAI) chatbots are increasingly used for idea generation, designers often struggle to craft effective prompts and find it difficult to express evolving visual concepts through text alone. In the formative study (N=6), we examined how designers use GenAI during ideation, revealing that text-based prompting disrupts creative flow. To address these issues, we developed TalkSketch, an embedded multimodal AI sketching system that integrates freehand drawing with real-time speech input. TalkSketch aims to support a more fluid ideation process through capturing verbal descriptions during sketching and generating context-aware AI responses. Our work highlights the potential of GenAI tools to engage the design process itself rather than focusing on output.

Paper Structure

This paper contains 31 sections, 2 figures.

Figures (2)

  • Figure 1: Overview of the TalkSketch system interface. (a)Sketching module: Users draw product concepts (e.g., a toaster) using stylus input on the canvas. The interface includes a sketch gallery, drawing toolbar, and a button to launch the Multimodal AI Chatbot. (b)Talking module: Voice recording captures the user's thinking aloud during sketching. (c) AI Insight panel shows automatic feedback based on the sketch and spoken transcript. (d) Text and image generation interface for multimodal interaction with the AI. Together, (c) and (d) constitute the Multimodal AI Chatbot module.
  • Figure 2: The workflow of the TalkSketch system. The process starts with Sketching with Talking, where users draw freely on a tablet while voicing their ideas. These inputs are channelled into the Multimodal AI Chatbot, which generates AI Insights based on the user's sketch and transcript. Users then engage in exploration via Multimodal Interaction, with text or image generation to improve their design ideas. Finally, users can export AI-generated images back to the canvas for further sketching and refinement.