Table of Contents
Fetching ...

A Framework for Portrait Stylization with Skin-Tone Awareness and Nudity Identification

Seungkwon Kim, Sangyeon Kim, Seung-Hun Nam

TL;DR

This work tackles practical portrait stylization by addressing skin-tone fidelity and explicit-content filtering in production settings. It introduces STAPSM, a skin-tone-aware stylization module using LoRA with skin-tone spectrum augmentation and a progressive edge-then-depth ControlNet inference, and NCIM, a nudity content identification suite that combines CLIP-based filtering with BLIP caption-based keyword matching. The proposed system demonstrates superior skin-tone preservation, improved nudity detection reliability, and successful real-world deployment as TOON-FILTER, handling over 2 million images with no reported incidents. Together, these contributions enable high-quality, safer portrait stylization suitable for enterprise Webtoon IP applications.

Abstract

Portrait stylization is a challenging task involving the transformation of an input portrait image into a specific style while preserving its inherent characteristics. The recent introduction of Stable Diffusion (SD) has significantly improved the quality of outcomes in this field. However, a practical stylization framework that can effectively filter harmful input content and preserve the distinct characteristics of an input, such as skin-tone, while maintaining the quality of stylization remains lacking. These challenges have hindered the wide deployment of such a framework. To address these issues, this study proposes a portrait stylization framework that incorporates a nudity content identification module (NCIM) and a skin-tone-aware portrait stylization module (STAPSM). In experiments, NCIM showed good performance in enhancing explicit content filtering, and STAPSM accurately represented a diverse range of skin tones. Our proposed framework has been successfully deployed in practice, and it has effectively satisfied critical requirements of real-world applications.

A Framework for Portrait Stylization with Skin-Tone Awareness and Nudity Identification

TL;DR

This work tackles practical portrait stylization by addressing skin-tone fidelity and explicit-content filtering in production settings. It introduces STAPSM, a skin-tone-aware stylization module using LoRA with skin-tone spectrum augmentation and a progressive edge-then-depth ControlNet inference, and NCIM, a nudity content identification suite that combines CLIP-based filtering with BLIP caption-based keyword matching. The proposed system demonstrates superior skin-tone preservation, improved nudity detection reliability, and successful real-world deployment as TOON-FILTER, handling over 2 million images with no reported incidents. Together, these contributions enable high-quality, safer portrait stylization suitable for enterprise Webtoon IP applications.

Abstract

Portrait stylization is a challenging task involving the transformation of an input portrait image into a specific style while preserving its inherent characteristics. The recent introduction of Stable Diffusion (SD) has significantly improved the quality of outcomes in this field. However, a practical stylization framework that can effectively filter harmful input content and preserve the distinct characteristics of an input, such as skin-tone, while maintaining the quality of stylization remains lacking. These challenges have hindered the wide deployment of such a framework. To address these issues, this study proposes a portrait stylization framework that incorporates a nudity content identification module (NCIM) and a skin-tone-aware portrait stylization module (STAPSM). In experiments, NCIM showed good performance in enhancing explicit content filtering, and STAPSM accurately represented a diverse range of skin tones. Our proposed framework has been successfully deployed in practice, and it has effectively satisfied critical requirements of real-world applications.
Paper Structure (8 sections, 6 figures, 2 tables)

This paper contains 8 sections, 6 figures, 2 tables.

Figures (6)

  • Figure 1: Abnormal portrait stylization when using a generative model fine-tuned for a specific Webtoon character. The proposed framework is designed to prevent such issues.
  • Figure 2: Overall architecture of proposed STAPSM consisting of fine-tuning phase with skin-tone spectrum augmentation and progressive inference phase.
  • Figure 3: Overall architecture of NCIM with CLIP embedding-based filtering and BLIP caption-based keyword matching.
  • Figure 4: Analysis of skin-tone distribution in real-world portrait, original, and augmented samples. Here, kernel density estimation plots for each RGB channel were generated from the parsed skin areas obtained using BiseNet v2 yu2021bisenet. Instead of a distribution dominantly concentrated in specific bands as shown in (b), we adopted skin-tone spectrum augmentation to achieve the result shown in (c) to emulate an even distribution similar to the result shown in (a).
  • Figure 5: Nudity identification analysis. NSFW-D laionnsfw confuses bikini images, acknowledged as explicit content in some cultures, as shown in (a) and (b). Possible misuse case when nudity filtering system does not work well, as shown in (c). Larger size of words results in a higher frequency in (d).
  • ...and 1 more figures