Table of Contents
Fetching ...

DesignMinds: Enhancing Video-Based Design Ideation with Vision-Language Model and Context-Injected Large Language Model

Tianhao He, Andrija Stankovic, Evangelos Niforatos, Gerd Kortuem

TL;DR

This paper presents DesignMinds, a prototype that integrates a state-of-the-art Vision-Language Model (VLM) with a context-enhanced Large Language Model (LLM) to support ideation in VBD, and demonstrates that this technology significantly enhances the flexibility and originality of ideation, while also increasing task engagement.

Abstract

Ideation is a critical component of video-based design (VBD), where videos serve as the primary medium for design exploration and inspiration. The emergence of generative AI offers considerable potential to enhance this process by streamlining video analysis and facilitating idea generation. In this paper, we present DesignMinds, a prototype that integrates a state-of-the-art Vision-Language Model (VLM) with a context-enhanced Large Language Model (LLM) to support ideation in VBD. To evaluate DesignMinds, we conducted a between-subject study with 35 design practitioners, comparing its performance to a baseline condition. Our results demonstrate that DesignMinds significantly enhances the flexibility and originality of ideation, while also increasing task engagement. Importantly, the introduction of this technology did not negatively impact user experience, technology acceptance, or usability.

DesignMinds: Enhancing Video-Based Design Ideation with Vision-Language Model and Context-Injected Large Language Model

TL;DR

This paper presents DesignMinds, a prototype that integrates a state-of-the-art Vision-Language Model (VLM) with a context-enhanced Large Language Model (LLM) to support ideation in VBD, and demonstrates that this technology significantly enhances the flexibility and originality of ideation, while also increasing task engagement.

Abstract

Ideation is a critical component of video-based design (VBD), where videos serve as the primary medium for design exploration and inspiration. The emergence of generative AI offers considerable potential to enhance this process by streamlining video analysis and facilitating idea generation. In this paper, we present DesignMinds, a prototype that integrates a state-of-the-art Vision-Language Model (VLM) with a context-enhanced Large Language Model (LLM) to support ideation in VBD. To evaluate DesignMinds, we conducted a between-subject study with 35 design practitioners, comparing its performance to a baseline condition. Our results demonstrate that DesignMinds significantly enhances the flexibility and originality of ideation, while also increasing task engagement. Importantly, the introduction of this technology did not negatively impact user experience, technology acceptance, or usability.

Paper Structure

This paper contains 27 sections, 6 figures, 2 tables.

Figures (6)

  • Figure 1: DesignMinds consists of two primary components: the backend and the front-end. The backend includes a VLM and a LLM integrated with a design knowledge repository. The front-end features a video playback region alongside a conversational window. The videos are first processed to extract key terms (highlighted in pink in video description) and are then connected into a comprehensive description (blue in video description) using in-built language linking functions. These complete descriptions are then passed to the LLM, along with a knowledge repository enriched by selected design books from a committee vote. Designers can then use the features in front-end to watch the video playback to enhance trust and grounding for the design context, and engage in ideation through conversations in the conversational window.
  • Figure 2: The interface of DesignMinds primarily features a video player on the left and an LLM conversation window in the center. To facilitate organized ideation recording in the later study, we additionally included a note-taking space bellow a description of VBD tasks for recording participants' divergent thinking during the study tasks (see Supplementary Text \ref{['instruction text']}) for detailed text. When designers use DesignMinds, the system initially performs a background pre-analysis of the video content on the left, and transitions video content to the chat interface in the center. Designers subsequently interact via chatting and generate inspiration as Divergent Thinking notes on the right.
  • Figure 3: During the study, participants were initially asked to familiarize themselves with both the environment and DesignMinds (Testing). They received instructions on the components of the prototype and how to interact with it. Following this, participants completed consent and demographic forms for background information. They were then provided with instructions for the tasks (Preparation). Participants were randomly divided into two groups: the experimental group, which interacted with the chatbot DesignMinds, and the control group, where participants continued their usual practice for design inspiration. Each participant group was assigned two tasks with different design contexts, presented in a counterbalanced order. In the next session (Post Session), participants were asked to complete the UEQ and UTAUT questionnaires. Finally, they were interviewed for about 5 minutes on three topics: overall experience, typical ideation process, and their attitudes towards AI.
  • Figure 4: Radar chart depicting the evaluation scores of design thinking across raters for the experimental and control groups. Errors are indicated by shaded regions. Attributes marked with asterisks (* or **) represent significant differences. * denotes 0.01 < p < .05, and ** denotes p < .001.
  • Figure 5: Plots displaying the average pupil dilation (\ref{['fig:pupil']}), fixation rate and duration (\ref{['fig:fixation']}), average blink rate and duration (\ref{['fig:blink']}), and average saccade rate and velocity (\ref{['fig:saccade']}) for the experimental and control groups. Accompanying histograms with error bars are also provided for each measure. Attributes and subplots marked with asterisks (* or **) represent significant differences. * denotes 0.01 < p < .05, and ** denotes p < .001.
  • ...and 1 more figures