Table of Contents
Fetching ...

SketcherX: AI-Driven Interactive Robotic drawing with Diffusion model and Vectorization Techniques

Jookyung Song, Mookyoung Kang, Nojun Kwak

TL;DR

This work advances the field of robotic art by positioning robots as active participants in the creative process, paving the way for future explorations in interactive, human-robot artistic collaboration.

Abstract

We introduce SketcherX, a novel robotic system for personalized portrait drawing through interactive human-robot engagement. Unlike traditional robotic art systems that rely on analog printing techniques, SketcherX captures and processes facial images to produce vectorized drawings in a distinctive, human-like artistic style. The system comprises two 6-axis robotic arms : a face robot, which is equipped with a head-mounted camera and Large Language Model (LLM) for real-time interaction, and a drawing robot, utilizing a fine-tuned Stable Diffusion model, ControlNet, and Vision-Language models for dynamic, stylized drawing. Our contributions include the development of a custom Vector Low Rank Adaptation model (LoRA), enabling seamless adaptation to various artistic styles, and integrating a pair-wise fine-tuning approach to enhance stroke quality and stylistic accuracy. Experimental results demonstrate the system's ability to produce high-quality, personalized portraits within two minutes, highlighting its potential as a new paradigm in robotic creativity. This work advances the field of robotic art by positioning robots as active participants in the creative process, paving the way for future explorations in interactive, human-robot artistic collaboration.

SketcherX: AI-Driven Interactive Robotic drawing with Diffusion model and Vectorization Techniques

TL;DR

This work advances the field of robotic art by positioning robots as active participants in the creative process, paving the way for future explorations in interactive, human-robot artistic collaboration.

Abstract

We introduce SketcherX, a novel robotic system for personalized portrait drawing through interactive human-robot engagement. Unlike traditional robotic art systems that rely on analog printing techniques, SketcherX captures and processes facial images to produce vectorized drawings in a distinctive, human-like artistic style. The system comprises two 6-axis robotic arms : a face robot, which is equipped with a head-mounted camera and Large Language Model (LLM) for real-time interaction, and a drawing robot, utilizing a fine-tuned Stable Diffusion model, ControlNet, and Vision-Language models for dynamic, stylized drawing. Our contributions include the development of a custom Vector Low Rank Adaptation model (LoRA), enabling seamless adaptation to various artistic styles, and integrating a pair-wise fine-tuning approach to enhance stroke quality and stylistic accuracy. Experimental results demonstrate the system's ability to produce high-quality, personalized portraits within two minutes, highlighting its potential as a new paradigm in robotic creativity. This work advances the field of robotic art by positioning robots as active participants in the creative process, paving the way for future explorations in interactive, human-robot artistic collaboration.
Paper Structure (15 sections, 4 equations, 10 figures)

This paper contains 15 sections, 4 equations, 10 figures.

Figures (10)

  • Figure 1: Overview of the SketcherX process. The top-right section shows an actual photograph of the SketcherX, CES 2023 exhibition, and examples of actual drawing. SketcherX consists of two robotic arms: one interacts with users using a camera and LLM, while the other processes images to create stylized, vectorized portraits.
  • Figure 2: Overview of the robotic vector sketching process. It begins with extracting the user’s facial features using canny edge preprocessor, with its encoded latent vector $c_{f}$fed into ControlNet. Simultaneously, a VL model generates textual descriptions of the user's attributes, which are refined and encoded by the CLIP text encoder into $c_{t}$. The text-to-image model, initialized with latent vector $z_{t}$, is fine-tuned by merging Vector LoRA and Style LoRA, producing vector-friendly strokes while maintaining the desired artistic style.
  • Figure 3: We fine-tuned the model using pair-wise data by applying both context consistency loss and reconstruction loss. For example, if the text prompt for the regularization image $x_{reg}$ is "A middle-aged man with glasses {description}" the corresponding text prompt for the style image $x_{style}$ would be "A middle-aged man with glasses {description} + lineart by Xorbis {unique identifier}," creating a pair-wise data set. Here, $c_{style}$ is the encoded vector of $x_{style}$'s text prompt, and $c_{reg}$ is the encoded vector of $x_{reg}$'s text prompt. The generated image $\hat{x}_{\theta} (z, c_{style})$ is conditioned on $c_{style}$. By calculating the L2 loss between the two generated images, we maintain context consistency during the fine-tuning process.
  • Figure 4: In the figure, Style1 LoRA and Style2 LoRA represent distinct artistic styles. When merged with Vector LoRA, each style is preserved but transformed into clean, vector-friendly strokes. Using only the Style LoRAs results in distorted SVG shapes unsuitable for robotic drawing. However, combining them with Vector LoRA produces clean, continuous strokes, ideal for robotic drawing.
  • Figure 5: Hardware configuration and system schematic illustrating the setup of SketcherX.
  • ...and 5 more figures