Table of Contents
Fetching ...

Brickify: Enabling Expressive Design Intent Specification through Direct Manipulation on Design Tokens

Xinyu Shi, Yinghou Wang, Ryan Rossi, Jian Zhao

TL;DR

Brickify confronts the challenge that natural-language prompts struggle to convey precise visual design intent. It introduces a visual-centric workflow that converts reference imagery into reusable design tokens (subject, color, style, concept) and enables direct manipulation to build a visual lexicon that governs AI generation. Through iterative, designer-driven studies, Brickify demonstrated improved clarity and control in expressing complex visual relationships, along with faster refinements when a clear target exists. The approach highlights a principled shift toward interactive token-based design and opens avenues for richer token ecosystems, broader modalities, and bi-directional collaboration between humans and AI in graphic design. The work advances human-AI co-creation by providing a tangible, repeatable framework for visual intent specification and execution.

Abstract

Expressing design intent using natural language prompts requires designers to verbalize the ambiguous visual details concisely, which can be challenging or even impossible. To address this, we introduce Brickify, a visual-centric interaction paradigm -- expressing design intent through direct manipulation on design tokens. Brickify extracts visual elements (e.g., subject, style, and color) from reference images and converts them into interactive and reusable design tokens that can be directly manipulated (e.g., resize, group, link, etc.) to form the visual lexicon. The lexicon reflects users' intent for both what visual elements are desired and how to construct them into a whole. We developed Brickify to demonstrate how AI models can interpret and execute the visual lexicon through an end-to-end pipeline. In a user study, experienced designers found Brickify more efficient and intuitive than text-based prompts, allowing them to describe visual details, explore alternatives, and refine complex designs with greater ease and control.

Brickify: Enabling Expressive Design Intent Specification through Direct Manipulation on Design Tokens

TL;DR

Brickify confronts the challenge that natural-language prompts struggle to convey precise visual design intent. It introduces a visual-centric workflow that converts reference imagery into reusable design tokens (subject, color, style, concept) and enables direct manipulation to build a visual lexicon that governs AI generation. Through iterative, designer-driven studies, Brickify demonstrated improved clarity and control in expressing complex visual relationships, along with faster refinements when a clear target exists. The approach highlights a principled shift toward interactive token-based design and opens avenues for richer token ecosystems, broader modalities, and bi-directional collaboration between humans and AI in graphic design. The work advances human-AI co-creation by providing a tangible, repeatable framework for visual intent specification and execution.

Abstract

Expressing design intent using natural language prompts requires designers to verbalize the ambiguous visual details concisely, which can be challenging or even impossible. To address this, we introduce Brickify, a visual-centric interaction paradigm -- expressing design intent through direct manipulation on design tokens. Brickify extracts visual elements (e.g., subject, style, and color) from reference images and converts them into interactive and reusable design tokens that can be directly manipulated (e.g., resize, group, link, etc.) to form the visual lexicon. The lexicon reflects users' intent for both what visual elements are desired and how to construct them into a whole. We developed Brickify to demonstrate how AI models can interpret and execute the visual lexicon through an end-to-end pipeline. In a user study, experienced designers found Brickify more efficient and intuitive than text-based prompts, allowing them to describe visual details, explore alternatives, and refine complex designs with greater ease and control.

Paper Structure

This paper contains 83 sections, 17 figures, 2 tables.

Figures (17)

  • Figure 1: Design example of a Halloween party poster, showing (a) the color palette, (b-f) reference images with highlighted elements, and (g) the envisioned poster in designer's mind. (h) illustrates the spatial relationships between pumpkins, building, and moon. We have obtained the designer's consent to include this design in the paper.
  • Figure 2: The definition of design tokens in Brickify: visual, textual, and imaginative tokens. Each type of tokens has their own appearances and life-cycles.
  • Figure 3: Demonstration of exploring different compositions through direct manipulation on design tokens. (a)–(d) show how adjusting sizes and positions of the owl and car tokens changes their relationships in the outcomes.
  • Figure 4: User interface of Brickify, consisting of three panels: (A) Mood Board Panel for arranging reference images and creating persistent design tokens (subject, color, style, concept), which can be drag-and-dropped (b1) into (B) Token Manipulation Panel for direct manipulation (b2 – b6). Clicking the Generate button (c), generated results are organized in (C) History Panel.
  • Figure 5: The technical pipeline of Brickify interprets and executes the visual lexicon step-by-step, using off-the-shelf methods.
  • ...and 12 more figures