Table of Contents
Fetching ...

CPT: Controllable and Editable Design Variations with Language Models

Karthik Suresh, Amine Ben Khalifa, Li Zhang, Wei-ting Hsu, Fangzheng Wu, Vinay More, Asim Kadav

Abstract

Designing visually diverse and high-quality designs remains a manual, time-consuming process, limiting scalability and personalization in creative workflows. We present a system for generating editable design variations using a decoder-only language model, the Creative Pre-trained Transformer (CPT), trained to predict visual style attributes in design templates. At the core of our approach is a new representation called Creative Markup Language (CML), a compact, machine-learning-friendly format that captures canvas-level structure, page layout, and element-level details (text, images, and vector graphics), including both content and style. We fine-tune CPT on a large corpus of design templates authored by professional designers, enabling it to learn meaningful, context-aware predictions for attributes such as color schemes and font choices. The model produces semantically structured and stylistically coherent outputs, preserving internal consistency across elements. Unlike generative image models, our system yields fully editable design documents rather than pixel-only images, allowing users to iterate and personalize within a design editor. In experiments, our approach generates contextual color and font variations for existing templates and shows promise in adjusting layouts while maintaining design principles.

CPT: Controllable and Editable Design Variations with Language Models

Abstract

Designing visually diverse and high-quality designs remains a manual, time-consuming process, limiting scalability and personalization in creative workflows. We present a system for generating editable design variations using a decoder-only language model, the Creative Pre-trained Transformer (CPT), trained to predict visual style attributes in design templates. At the core of our approach is a new representation called Creative Markup Language (CML), a compact, machine-learning-friendly format that captures canvas-level structure, page layout, and element-level details (text, images, and vector graphics), including both content and style. We fine-tune CPT on a large corpus of design templates authored by professional designers, enabling it to learn meaningful, context-aware predictions for attributes such as color schemes and font choices. The model produces semantically structured and stylistically coherent outputs, preserving internal consistency across elements. Unlike generative image models, our system yields fully editable design documents rather than pixel-only images, allowing users to iterate and personalize within a design editor. In experiments, our approach generates contextual color and font variations for existing templates and shows promise in adjusting layouts while maintaining design principles.

Paper Structure

This paper contains 24 sections, 6 figures, 3 tables.

Figures (6)

  • Figure 1: Our CPT model uses the context of the original template (far left) to generate font and color variations.
  • Figure 2: High-Level Overview of the Design Variations Pipeline
  • Figure 3: Human evaluation results
  • Figure 4: Our CPT model generates stylistic variations (right) from an original design (left). Each row shows a different example with either font or color variation. Notably, Example 2 illustrates the use of world knowledge to select a Halloween-inspired color palette, while Example 3 demonstrates context-aware typography, where playful fonts are chosen to match the event theme.
  • Figure 5: Examples of layout variations generated from a single template: the original square format (1:1) in the first row, adapted to a YouTube thumbnail in the second row, and to an Instagram Story in the third row.
  • ...and 1 more figures