Table of Contents
Fetching ...

DemoCaricature: Democratising Caricature Generation with a Rough Sketch

Dar-Yen Chen, Ayan Kumar Bhunia, Subhadeep Koley, Aneeshan Sain, Pinaki Nath Chowdhury, Yi-Zhe Song

TL;DR

The paper tackles the problem of generating personalized caricatures from a single reference photo and a rough sketch by leveraging a diffusion-based framework augmented with a sketch-conditioned adapter and single-image personalisation. It introduces Explicit Rank-1 Model Editing to selectively modify identity-related concepts in cross-attention, along with Random Mask Reconstruction to improve robustness to distorted shapes, and Concept Regularisation to mitigate overfitting. Evaluations on WebCaricature show strong identity, style, and shape fidelity, outperforming deformation-based caricature methods and existing SD-based personalisation baselines, with additional validation from a human user study. The approach enables non-experts to create high-quality caricatures with minimal input, highlighting a practical pathway for AI-assisted, artist-friendly visual expression without supplanting human creators.

Abstract

In this paper, we democratise caricature generation, empowering individuals to effortlessly craft personalised caricatures with just a photo and a conceptual sketch. Our objective is to strike a delicate balance between abstraction and identity, while preserving the creativity and subjectivity inherent in a sketch. To achieve this, we present Explicit Rank-1 Model Editing alongside single-image personalisation, selectively applying nuanced edits to cross-attention layers for a seamless merge of identity and style. Additionally, we propose Random Mask Reconstruction to enhance robustness, directing the model to focus on distinctive identity and style features. Crucially, our aim is not to replace artists but to eliminate accessibility barriers, allowing enthusiasts to engage in the artistry.

DemoCaricature: Democratising Caricature Generation with a Rough Sketch

TL;DR

The paper tackles the problem of generating personalized caricatures from a single reference photo and a rough sketch by leveraging a diffusion-based framework augmented with a sketch-conditioned adapter and single-image personalisation. It introduces Explicit Rank-1 Model Editing to selectively modify identity-related concepts in cross-attention, along with Random Mask Reconstruction to improve robustness to distorted shapes, and Concept Regularisation to mitigate overfitting. Evaluations on WebCaricature show strong identity, style, and shape fidelity, outperforming deformation-based caricature methods and existing SD-based personalisation baselines, with additional validation from a human user study. The approach enables non-experts to create high-quality caricatures with minimal input, highlighting a practical pathway for AI-assisted, artist-friendly visual expression without supplanting human creators.

Abstract

In this paper, we democratise caricature generation, empowering individuals to effortlessly craft personalised caricatures with just a photo and a conceptual sketch. Our objective is to strike a delicate balance between abstraction and identity, while preserving the creativity and subjectivity inherent in a sketch. To achieve this, we present Explicit Rank-1 Model Editing alongside single-image personalisation, selectively applying nuanced edits to cross-attention layers for a seamless merge of identity and style. Additionally, we propose Random Mask Reconstruction to enhance robustness, directing the model to focus on distinctive identity and style features. Crucially, our aim is not to replace artists but to eliminate accessibility barriers, allowing enthusiasts to engage in the artistry.
Paper Structure (17 sections, 6 equations, 12 figures, 2 tables)

This paper contains 17 sections, 6 equations, 12 figures, 2 tables.

Figures (12)

  • Figure 1: Within cross-attention layers, Explicit ROME (\ref{['sec:rank1']}) edits the concept entry with trainable target output $\mathbf{o}^*$ that encapsulates the identity features. We also employ a dynamic masking method (\ref{['sec:rmr']}), selectively occluding latent regions during training to enhance model robustness. Additional regularisation (\ref{['sec:reg']}) is applied to word embeddings and text encoding through superclass. During inference, a frozen T2I-sketch-adapter mou2023t2i provides shape guidance, resulting in an output caricature with the desired identity and shape. A similar training pipeline is used for the style image as well. We use \ref{['eq:style_mixing']} to perform sketch+style guided caricature generation.
  • Figure 2: Qualitative comparison with GAN-based deformation models. These visual results illustrate our method's higher fidelity and shape flexibility in caricature synthesis compared to existing method viz. StyleCariGAN Jang2021StyleCari, CariGANs cao2018carigans, and WarpGAN shi2019warpgan.
  • Figure 3: Comparison with T2I personalisation approaches. Our framework is stronger in single-image personalisation caricature synthesis against Perfusion tewel2023keylocked and TI gal2023an.
  • Figure 4: Comparison with T2I personalisation approaches with style reference. Demonstrates our model's robustness in generating stylised caricatures with faithful identity and style, surpassing other methods like Perfusion tewel2023keylocked and TI gal2023an.
  • Figure 5: Identity Scale Adaptability. Our method provides a dynamic adjustment of the identity scale $s$, exemplifying flexibility.
  • ...and 7 more figures