Table of Contents
Fetching ...

SketchDeco: Training-Free Latent Composition for Precise Sketch Colourisation

Chaitat Utintu, Pinaki Nath Chowdhury, Aneeshan Sain, Subhadeep Koley, Ayan Kumar Bhunia, Yi-Zhe Song

TL;DR

SketchDeco presents a training-free framework for precise sketch colourisation by decoupling global colour priors from local region edits. It adopts a two-stage pipeline: a global stage leveraging diffusion priors and semantic guidance to create a base image aligned with user palettes, followed by a local stage that composites region colours in latent space via diffusion inversion and guided sampling, aided by a self-attention mechanism. The method requires only a sketch, region masks, and colour palettes, achieving fast results (15–20 inference steps) on consumer GPUs without fine-tuning. Across diverse datasets, SketchDeco demonstrates strong global fidelity and local colour accuracy, including robust performance on in-the-wild sketches, making professional, region-aware colourisation more accessible for design workflows.

Abstract

We introduce SketchDeco, a training-free approach to sketch colourisation that bridges the gap between professional design needs and intuitive, region-based control. Our method empowers artists to use simple masks and colour palettes for precise spatial and chromatic specification, avoiding both the tediousness of manual assignment and the ambiguity of text-based prompts. We reformulate this task as a novel, training-free composition problem. Our core technical contribution is a guided latent-space blending process: we first leverage diffusion inversion to precisely ``paint'' user-defined colours into specified regions, and then use a custom self-attention mechanism to harmoniously blend these local edits with a globally consistent base image. This ensures both local colour fidelity and global harmony without requiring any model fine-tuning. Our system produces high-quality results in 15--20 inference steps on consumer GPUs, making professional-quality, controllable colourisation accessible.

SketchDeco: Training-Free Latent Composition for Precise Sketch Colourisation

TL;DR

SketchDeco presents a training-free framework for precise sketch colourisation by decoupling global colour priors from local region edits. It adopts a two-stage pipeline: a global stage leveraging diffusion priors and semantic guidance to create a base image aligned with user palettes, followed by a local stage that composites region colours in latent space via diffusion inversion and guided sampling, aided by a self-attention mechanism. The method requires only a sketch, region masks, and colour palettes, achieving fast results (15–20 inference steps) on consumer GPUs without fine-tuning. Across diverse datasets, SketchDeco demonstrates strong global fidelity and local colour accuracy, including robust performance on in-the-wild sketches, making professional, region-aware colourisation more accessible for design workflows.

Abstract

We introduce SketchDeco, a training-free approach to sketch colourisation that bridges the gap between professional design needs and intuitive, region-based control. Our method empowers artists to use simple masks and colour palettes for precise spatial and chromatic specification, avoiding both the tediousness of manual assignment and the ambiguity of text-based prompts. We reformulate this task as a novel, training-free composition problem. Our core technical contribution is a guided latent-space blending process: we first leverage diffusion inversion to precisely ``paint'' user-defined colours into specified regions, and then use a custom self-attention mechanism to harmoniously blend these local edits with a globally consistent base image. This ensures both local colour fidelity and global harmony without requiring any model fine-tuning. Our system produces high-quality results in 15--20 inference steps on consumer GPUs, making professional-quality, controllable colourisation accessible.
Paper Structure (19 sections, 6 equations, 19 figures, 3 tables)

This paper contains 19 sections, 6 equations, 19 figures, 3 tables.

Figures (19)

  • Figure 1: Framework Overview. Given an input sketch and region-specific colour palettes with corresponding masks, our method employs a divide-and-conquer strategy consisting of two sequential stages: Global and Local Sketch Colourisations.
  • Figure 2: Global and Local Sketch Colourisation Stages. (a) In global stage\ref{['subsec:global']}, given a sketch $\mathcal{S}$ and colour palettes $\{\mathcal{P_H}\}_{i=1}^{n}$, BLIP-2 blip2 infers sketch class semantics, a K-D Tree kdtree maps palette hexcodes to colour names, and Scribble ControlNet diff4 generates globally colourised results $\{\mathcal{I}^{G}_{\mathcal{P_H}_{i}}\}_{i=1}^{n}$ and an auxiliary image $\mathcal{I}^{G}_{\mathcal{P}\phi}$, preserving $id(\mathcal{S})$ and $colour(\mathcal{P_{H}})$, while enabling interactive refinement for adjusting unsatisfying textures further. (b) In local stage\ref{['subsec:local']}, pre-colourised regions from $\{\mathcal{I}^{G}_{\mathcal{P_H}_{i}}\}_{i=1}^{n}$ are composed with background from $\mathcal{I}^{G}_{\mathcal{P}\phi}$ to form composited image $\mathcal{I}^{\ast}$, which is then inverted to noisy latents $z^{\ast}$ and refined via guided-sampling lu2023dpmsolver. Noise incorporation enhances boundary smoothness, while SA injection, guided by $\tau$, produces the final result $\mathcal{I}^{L}$ with smooth blending and structural fidelity.
  • Figure 3: 3D search space constructed using K-D Tree algorithm.
  • Figure 4: User-interactive refinement process for interior design.
  • Figure 5: Initial Gaussian noise incorporation in latent space.
  • ...and 14 more figures