Table of Contents
Fetching ...

Stroke of Surprise: Progressive Semantic Illusions in Vector Sketching

Huai-Hsun Cheng, Siang-Ling Zhang, Yu-Lun Liu

TL;DR

Stroke of Surprise introduces Progressive Semantic Illusions to vector sketching, enabling a single stroke sequence to morph from an initial concept to a distinct later concept through delta strokes. The method employs a sequence-aware dual-branch Score Distillation Sampling that jointly optimizes shared stroke parameters for both the prefix and the full sketch, augmented by an Overlay Loss that enforces spatial complementarity and prevents occlusion. By discovering a common structural subspace that supports multi-phase transitions, the approach significantly improves recognizability and illusion strength over state-of-the-art baselines and scales to more than two phases with varied vector representations. The framework offers robust, human-aligned synthesis with potential applications in AI-assisted drawing and visual communication, while acknowledging limitations related to pre-trained diffusion priors and complex shape guidance.

Abstract

Visual illusions traditionally rely on spatial manipulations such as multi-view consistency. In this work, we introduce Progressive Semantic Illusions, a novel vector sketching task where a single sketch undergoes a dramatic semantic transformation through the sequential addition of strokes. We present Stroke of Surprise, a generative framework that optimizes vector strokes to satisfy distinct semantic interpretations at different drawing stages. The core challenge lies in the "dual-constraint": initial prefix strokes must form a coherent object (e.g., a duck) while simultaneously serving as the structural foundation for a second concept (e.g., a sheep) upon adding delta strokes. To address this, we propose a sequence-aware joint optimization framework driven by a dual-branch Score Distillation Sampling (SDS) mechanism. Unlike sequential approaches that freeze the initial state, our method dynamically adjusts prefix strokes to discover a "common structural subspace" valid for both targets. Furthermore, we introduce a novel Overlay Loss that enforces spatial complementarity, ensuring structural integration rather than occlusion. Extensive experiments demonstrate that our method significantly outperforms state-of-the-art baselines in recognizability and illusion strength, successfully expanding visual anagrams from the spatial to the temporal dimension. Project page: https://stroke-of-surprise.github.io/

Stroke of Surprise: Progressive Semantic Illusions in Vector Sketching

TL;DR

Stroke of Surprise introduces Progressive Semantic Illusions to vector sketching, enabling a single stroke sequence to morph from an initial concept to a distinct later concept through delta strokes. The method employs a sequence-aware dual-branch Score Distillation Sampling that jointly optimizes shared stroke parameters for both the prefix and the full sketch, augmented by an Overlay Loss that enforces spatial complementarity and prevents occlusion. By discovering a common structural subspace that supports multi-phase transitions, the approach significantly improves recognizability and illusion strength over state-of-the-art baselines and scales to more than two phases with varied vector representations. The framework offers robust, human-aligned synthesis with potential applications in AI-assisted drawing and visual communication, while acknowledging limitations related to pre-trained diffusion priors and complex shape guidance.

Abstract

Visual illusions traditionally rely on spatial manipulations such as multi-view consistency. In this work, we introduce Progressive Semantic Illusions, a novel vector sketching task where a single sketch undergoes a dramatic semantic transformation through the sequential addition of strokes. We present Stroke of Surprise, a generative framework that optimizes vector strokes to satisfy distinct semantic interpretations at different drawing stages. The core challenge lies in the "dual-constraint": initial prefix strokes must form a coherent object (e.g., a duck) while simultaneously serving as the structural foundation for a second concept (e.g., a sheep) upon adding delta strokes. To address this, we propose a sequence-aware joint optimization framework driven by a dual-branch Score Distillation Sampling (SDS) mechanism. Unlike sequential approaches that freeze the initial state, our method dynamically adjusts prefix strokes to discover a "common structural subspace" valid for both targets. Furthermore, we introduce a novel Overlay Loss that enforces spatial complementarity, ensuring structural integration rather than occlusion. Extensive experiments demonstrate that our method significantly outperforms state-of-the-art baselines in recognizability and illusion strength, successfully expanding visual anagrams from the spatial to the temporal dimension. Project page: https://stroke-of-surprise.github.io/
Paper Structure (29 sections, 9 equations, 20 figures, 1 table)

This paper contains 29 sections, 9 equations, 20 figures, 1 table.

Figures (20)

  • Figure 1: Challenges in progressive illusion sketching. (a) Raster-based methods (e.g., Nano Banana Pro) rely on destructive editing, modifying the initial structure to fit the final target and thus violating the progressive constraint. (b) Vector-based baselines (e.g., SketchDreamer qu2023sketchdreamer or SketchAgent vinker2025sketchagent) employ a greedy strategy, where specific Phase 1 details become semantic noise or clutter in Phase 2. (c) Ours achieves dual-semantic coherency by jointly optimizing for a common structural subspace, ensuring the initial strokes are valid building blocks for both interpretations (e.g., "rabbit" $\rightarrow$ "elephant").
  • Figure 2: Pipeline overview. Our method optimizes a set of learnable stroke parameters, which are divided into prefix strokes $S_\text{prefix}$ and delta strokes $S_\text{delta}$. The optimization process involves two parallel branches. In the top branch, only the prefix strokes are rendered by a differentiable rasterizer to create a partial sketch (e.g., a rabbit). This sketch is then guided by a pre-trained, frozen text-to-image diffusion model using a prompt corresponding to the prefix ("a rabbit"), resulting in the prefix SDS loss $\mathcal{L}_{\text{SDS}}^{\text{prefix}}$. In the bottom branch, the full set of strokes is rendered to create the complete sketch (e.g., a horse). This is guided by the same diffusion model using a prompt for the full object ("a horse"), resulting in the full SDS loss $\mathcal{L}_{\text{SDS}}^{\text{full}}$. The total SDS guidance loss is the sum of these two terms $\mathcal{L}_{\text{SDS}} = \mathcal{L}_{\text{SDS}}^{\text{prefix}} + \mathcal{L}_{\text{SDS}}^{\text{full}}$. Gradients from this total loss are backpropagated to update all learnable stroke parameters.
  • Figure 3: Motivation and formulation of the overlay loss.(Top) Motivation: Without constraints, redundant strokes (b) occlude the prefix. Hard intersection (c) allows strokes to be placed arbitrarily close, causing crowding. (Bottom) Formulation: We compute a soft overlay loss (f) from blurred maps (d, e). The blur expands the penalty region to create a spatial buffer, forcing new strokes to maintain sufficient distance from the prefix to ensure visual clarity and separation.
  • Figure 4: VLM-based evaluation and ranking pipeline. We employ GPT-4o to assess the quality of illusion sketches. (a) For Phase 1, the model evaluates the recognizability of the prefix sketch ($S_\text{prefix}$). (b) For Phase 2, the model evaluates the full sketch ($S_\text{full}$) while simultaneously comparing it against the delta strokes ($S_\text{delta}$). This comparison ensures that the prefix strokes provide essential structural scaffolding for the second concept, rather than being merely overwritten. High scores are awarded only when $S_\text{full}$ is significantly more recognizable than $S_\text{delta}$ alone.
  • Figure 5: Multi-phase pipeline. We scale to $K$ phases (e.g., Apple$\to$Sheep$\to$Einstein) using cumulative stroke subsets ($S_1, \ldots, S_K$). Parallel branches optimize each cumulative sketch $I_{1:i}$ against prompt $p_i$. Joint optimization ensures early strokes receive gradients from all subsequent losses ($\sum \mathcal{L}_{\text{SDS}}^i$), creating a structure primed for the entire evolutionary sequence.
  • ...and 15 more figures