Table of Contents
Fetching ...

SkyReels-Text: Fine-grained Font-Controllable Text Editing for Poster Design

Yunjie Yu, Jingchen Wu, Junchen Zhu, Chunze Lin, Guibin Chen

TL;DR

SkyReels-Text is presented, a novel font-controllable framework for precise poster text editing that bridges the gap between general-purpose image editing and professional-grade typographic design.

Abstract

Artistic design such as poster design often demands rapid yet precise modification of textual content while preserving visual harmony and typographic intent, especially across diverse font styles. Although modern image editing models have grown increasingly powerful, they still fall short in fine-grained, font-aware text manipulation, limiting their utility in professional design workflows such as poster editing. To address this issue, we present SkyReels-Text, a novel font-controllable framework for precise poster text editing. Our method enables simultaneous editing of multiple text regions, each rendered in distinct typographic styles, while preserving the visual appearance of non-edited regions. Notably, our model requires neither font labels nor fine-tuning during inference: users can simply provide cropped glyph patches corresponding to their desired typography, even if the font is not included in any standard library. Extensive experiments on multiple datasets, including handwrittent text benchmarks, SkyReels-Text achieves state-of-the-art performance in both text fidelity and visual realism, offering unprecedented control over font families, and stylistic nuances. This work bridges the gap between general-purpose image editing and professional-grade typographic design.

SkyReels-Text: Fine-grained Font-Controllable Text Editing for Poster Design

TL;DR

SkyReels-Text is presented, a novel font-controllable framework for precise poster text editing that bridges the gap between general-purpose image editing and professional-grade typographic design.

Abstract

Artistic design such as poster design often demands rapid yet precise modification of textual content while preserving visual harmony and typographic intent, especially across diverse font styles. Although modern image editing models have grown increasingly powerful, they still fall short in fine-grained, font-aware text manipulation, limiting their utility in professional design workflows such as poster editing. To address this issue, we present SkyReels-Text, a novel font-controllable framework for precise poster text editing. Our method enables simultaneous editing of multiple text regions, each rendered in distinct typographic styles, while preserving the visual appearance of non-edited regions. Notably, our model requires neither font labels nor fine-tuning during inference: users can simply provide cropped glyph patches corresponding to their desired typography, even if the font is not included in any standard library. Extensive experiments on multiple datasets, including handwrittent text benchmarks, SkyReels-Text achieves state-of-the-art performance in both text fidelity and visual realism, offering unprecedented control over font families, and stylistic nuances. This work bridges the gap between general-purpose image editing and professional-grade typographic design.

Paper Structure

This paper contains 17 sections, 2 equations, 6 figures, 4 tables.

Figures (6)

  • Figure 1: SkyReels-Text modifies the text embedded in images with novel fonts controlled by single reference image for each font.
  • Figure 2: SkyReels-Text supports to edit the text in one image with different font styles.
  • Figure 3: Overview of the proposed method.
  • Figure 4: Comparison with state-of-the-art commercial image editing models in single-font edition. The first and second lines display the reference font style, the text before and after edition, and the input image, respectively. SkyReels-Text produces edits that more faithfully follow the target typography while preserving the background structure and content intact.
  • Figure 5: Comparison with state-of-the-art open-source image editing models in Chinese and English scene text editing.
  • ...and 1 more figures