Jigsaw: Supporting Designers to Prototype Multimodal Applications by Chaining AI Foundation Models

David Chuan-En Lin; Nikolas Martelaro

Jigsaw: Supporting Designers to Prototype Multimodal Applications by Chaining AI Foundation Models

David Chuan-En Lin, Nikolas Martelaro

TL;DR

Jigsaw enhanced designers’ understanding of available foundation model capabilities, provided guidance on combining capabilities across different modalities and tasks, and served as a canvas to support design exploration, prototyping, and documentation.

Abstract

Recent advancements in AI foundation models have made it possible for them to be utilized off-the-shelf for creative tasks, including ideating design concepts or generating visual prototypes. However, integrating these models into the creative process can be challenging as they often exist as standalone applications tailored to specific tasks. To address this challenge, we introduce Jigsaw, a prototype system that employs puzzle pieces as metaphors to represent foundation models. Jigsaw allows designers to combine different foundation model capabilities across various modalities by assembling compatible puzzle pieces. To inform the design of Jigsaw, we interviewed ten designers and distilled design goals. In a user study, we showed that Jigsaw enhanced designers' understanding of available foundation model capabilities, provided guidance on combining capabilities across different modalities and tasks, and served as a canvas to support design exploration, prototyping, and documentation.

Jigsaw: Supporting Designers to Prototype Multimodal Applications by Chaining AI Foundation Models

TL;DR

Abstract

Paper Structure (45 sections, 8 figures, 5 tables)

This paper contains 45 sections, 8 figures, 5 tables.

Introduction
Related Work
AI Foundation Models
Visual Programming Interfaces
Designer-AI Interaction
Formative Study
Participants and Procedure
Findings and Discussion
C1: Limited Knowledge of AI Capabilities
C2: Tedious to be AI-friendly
C3: Difficult to Combine Multiple Models
C4: Slow Prototyping and Iteration
Design Goals
D1: Catalog of AI Foundation Models
D2: User-friendly instead of AI-friendly
...and 30 more sections

Figures (8)

Figure 1: Users can search for model pieces by describing their task in the semantic search bar (a). Users can hover over a model piece to view a description of its capability, typical runtime, and an example input and output (b).
Figure 2: The translation glue piece converts a piece of text into a prompt format suitable for text-to-x generation models (a). The ideation glue piece generates an idea for a design task (b).
Figure 3: Users can drag puzzle pieces from the Catalog Panel onto the Assembly Panel (a), select pieces on the Assembly Panel by clicking on them (b), and remove pieces by dragging them to the trash bin or pressing the delete key (c). Users can duplicate pieces, and undo and redo actions using hotkeys (d-f).
Figure 4: When the user drags a puzzle piece close to another compatible piece, Jigsaw displays a semi-transparent preview of the potential connection. If the user releases the puzzle piece, it will snap into place (a). Conversely, if the user attempts to connect a puzzle piece to an incompatible piece, the new piece will be repelled, ensuring that users do not force a fit (b). Users can move multiple puzzle pieces simultaneously (c).
Figure 5: Text inputs can be directly typed into the Input Panel (a). Image, video, 3D, and audio inputs can be uploaded either by drag-and-drop or the file browser (b). Sketch inputs can be drawn (c). Text outputs can be viewed and copied by the user (d). Image, video, 3D, audio, and sketch outputs can be viewed in their respective media viewers and downloaded by the user (e-i).
...and 3 more figures

Jigsaw: Supporting Designers to Prototype Multimodal Applications by Chaining AI Foundation Models

TL;DR

Abstract

Jigsaw: Supporting Designers to Prototype Multimodal Applications by Chaining AI Foundation Models

Authors

TL;DR

Abstract

Table of Contents

Figures (8)