Table of Contents
Fetching ...

LICA: Layered Image Composition Annotations for Graphic Design Research

Elad Hirsch, Shubham Yadav, Mohit Garg, Purvanshi Mehta

Abstract

We introduce LICA (Layered Image Composition Annotations), a large-scale dataset of 1,550,244 multi-layer graphic design compositions designed to advance structured understanding and generation of graphic layouts1. In addition to ren- dered PNG images, LICA represents each design as a hierarchical composition of typed components including text, image, vector, and group elements, each paired with rich per-element metadata such as spatial geometry, typographic attributes, opacity, and visibility. The dataset spans 20 design categories and 971,850 unique templates, providing broad coverage of real-world design structures. We further introduce graphic design video as a new and largely unexplored challenge for current vision-language models through 27,261 animated layouts annotated with per-component keyframes and motion parameters. Beyond scale, LICA establishes a new paradigm of research tasks for graphic design, enabling structured investiga- tions into problems such as layer-aware inpainting, structured layout generation, controlled design editing, and temporally-aware generative modeling. By repre- senting design as a system of compositional layers and relationships, the dataset supports research on models that operate directly on design structure rather than pixels alone.

LICA: Layered Image Composition Annotations for Graphic Design Research

Abstract

We introduce LICA (Layered Image Composition Annotations), a large-scale dataset of 1,550,244 multi-layer graphic design compositions designed to advance structured understanding and generation of graphic layouts1. In addition to ren- dered PNG images, LICA represents each design as a hierarchical composition of typed components including text, image, vector, and group elements, each paired with rich per-element metadata such as spatial geometry, typographic attributes, opacity, and visibility. The dataset spans 20 design categories and 971,850 unique templates, providing broad coverage of real-world design structures. We further introduce graphic design video as a new and largely unexplored challenge for current vision-language models through 27,261 animated layouts annotated with per-component keyframes and motion parameters. Beyond scale, LICA establishes a new paradigm of research tasks for graphic design, enabling structured investiga- tions into problems such as layer-aware inpainting, structured layout generation, controlled design editing, and temporally-aware generative modeling. By repre- senting design as a system of compositional layers and relationships, the dataset supports research on models that operate directly on design structure rather than pixels alone.
Paper Structure (26 sections, 11 figures, 5 tables)

This paper contains 26 sections, 11 figures, 5 tables.

Figures (11)

  • Figure 1: LICA samples. Rendered design layouts spanning diverse categories and aspect ratios, with corresponding component-level bounding-box annotations extracted from the structured layout JSON. Unlike prior datasets that provide only coarse bounding boxes, LICA encodes the full component hierarchy with per-element typographic, spatial, and style metadata.
  • Figure 2: Layout semantic metadata. Each layout is annotated with multiple levels of semantic information beyond structural attributes, covering layout description, aesthetic analysis, user intent and component descriptions. Together, these annotations bridge the gap between low-level structural representation and high-level design semantics.
  • Figure 3: Template variants in LICA. Each row shows four layout variants derived from the same design template. Variants share a coherent design language -- color palette, typographic system, visual motifs, and compositional logic, while differing in element positioning, content, imagery, and spatial arrangement. This structure captures a nuanced form of design knowledge: the variants are unified not by identical parameters but by an implicit style identity that governs how elements relate to one another, a property that is often difficult to articulate verbally yet immediately recognisable to a trained designer. LICA contains 107,728 templates with multiple variants (up to 24 per template), providing large-scale natural supervision for studying style adherence, controlled variation, and content-agnostic layout consistency.
  • Figure 4: Template and layout semantic metadata. Each layout in the dataset is associated with language annotations describing the layout, user intent, and aesthetic analysis (examples of user intents are shown at the bottom). For templates, an additional annotation is provided at the template level (top).
  • Figure 5: Graphic design video in LICA. Each row shows four keyframes sampled from an animated design composition. Unlike static layouts, these compositions encode per-component temporal structure, including easing curves, start-time offsets, and durations, directly within the component hierarchy. Animations operate at the semantic layer level: each text block, image, or shape carries its own entrance, transition, and exit behavior independently of other elements, and timing is used as a deliberate compositional tool. This temporal logic is fundamentally different from natural video, where motion arises from continuous physical dynamics. In design video, motion is sparse, discrete, and driven by communicative intent rather than physical plausibility, making it a distinct and under-explored domain for temporal generative modeling. LICA contains 27,261 such animated layouts spanning presentations, social media videos, mobile ads, and other temporal design surfaces.
  • ...and 6 more figures