Table of Contents
Fetching ...

Stylecodes: Encoding Stylistic Information For Image Generation

Ciara Rowles

TL;DR

StyleCodes is proposed: an open-source and open-research style encoder architecture and training procedure to express image style as a 20-symbol base64 code and shows that the encoding results in minimal loss in quality compared to traditional image-to-style techniques.

Abstract

Diffusion models excel in image generation, but controlling them remains a challenge. We focus on the problem of style-conditioned image generation. Although example images work, they are cumbersome: srefs (style-reference codes) from MidJourney solve this issue by expressing a specific image style in a short numeric code. These have seen widespread adoption throughout social media due to both their ease of sharing and the fact they allow using an image for style control, without having to post the source images themselves. However, users are not able to generate srefs from their own images, nor is the underlying training procedure public. We propose StyleCodes: an open-source and open-research style encoder architecture and training procedure to express image style as a 20-symbol base64 code. Our experiments show that our encoding results in minimal loss in quality compared to traditional image-to-style techniques.

Stylecodes: Encoding Stylistic Information For Image Generation

TL;DR

StyleCodes is proposed: an open-source and open-research style encoder architecture and training procedure to express image style as a 20-symbol base64 code and shows that the encoding results in minimal loss in quality compared to traditional image-to-style techniques.

Abstract

Diffusion models excel in image generation, but controlling them remains a challenge. We focus on the problem of style-conditioned image generation. Although example images work, they are cumbersome: srefs (style-reference codes) from MidJourney solve this issue by expressing a specific image style in a short numeric code. These have seen widespread adoption throughout social media due to both their ease of sharing and the fact they allow using an image for style control, without having to post the source images themselves. However, users are not able to generate srefs from their own images, nor is the underlying training procedure public. We propose StyleCodes: an open-source and open-research style encoder architecture and training procedure to express image style as a 20-symbol base64 code. Our experiments show that our encoding results in minimal loss in quality compared to traditional image-to-style techniques.

Paper Structure

This paper contains 17 sections, 2 equations, 4 figures.

Figures (4)

  • Figure 1: Our Style Encoder compresses image styles into compact strings for style-conditioned generation.
  • Figure 2: Auto Encoder and Control Module Architecture
  • Figure 3: Example results with the left-most column being the source image with prompts "a close up man", "a woman portrait", "a cow" and "a bottle on a desk" with the same seeds after passing through the encoder to a stylecode and then used to generate the images.
  • Figure 4: Example of the results with various trained base models and the control module with the prompt "portrait of a man" and four different style images