A Style is Worth One Code: Unlocking Code-to-Style Image Generation with Discrete Style Space
Huijie Liu, Shuhao Cui, Haoxiang Cao, Shuai Ma, Kai Wu, Guoliang Kang
TL;DR
The paper tackles the challenge of generating novel, visually consistent styles without reference images or lengthy prompts by introducing code-to-style generation. It presents CoTyle, the first open-source framework that learns a discrete style codebook and an autoregressive style generator to condition a diffusion-based text-to-image model on style embeddings, enabling style synthesis from numerical codes. Through extensive experiments, CoTyle demonstrates high style consistency, competitive creativity, and the ability to interpolate between styles, while also supporting image-conditioned generation and style interpolation. The work offers a reproducible, portable approach to open-ended style design and paves the way for further research on discrete stylistic representations across modalities.
Abstract
Innovative visual stylization is a cornerstone of artistic creation, yet generating novel and consistent visual styles remains a significant challenge. Existing generative approaches typically rely on lengthy textual prompts, reference images, or parameter-efficient fine-tuning to guide style-aware image generation, but often struggle with style consistency, limited creativity, and complex style representations. In this paper, we affirm that a style is worth one numerical code by introducing the novel task, code-to-style image generation, which produces images with novel, consistent visual styles conditioned solely on a numerical style code. To date, this field has only been primarily explored by the industry (e.g., Midjourney), with no open-source research from the academic community. To fill this gap, we propose CoTyle, the first open-source method for this task. Specifically, we first train a discrete style codebook from a collection of images to extract style embeddings. These embeddings serve as conditions for a text-to-image diffusion model (T2I-DM) to generate stylistic images. Subsequently, we train an autoregressive style generator on the discrete style embeddings to model their distribution, allowing the synthesis of novel style embeddings. During inference, a numerical style code is mapped to a unique style embedding by the style generator, and this embedding guides the T2I-DM to generate images in the corresponding style. Unlike existing methods, our method offers unparalleled simplicity and diversity, unlocking a vast space of reproducible styles from minimal input. Extensive experiments validate that CoTyle effectively turns a numerical code into a style controller, demonstrating a style is worth one code.
