Table of Contents
Fetching ...

Stable-Hair: Real-World Hair Transfer via Diffusion Model

Yuxuan Zhang, Qing Zhang, Yiren Song, Jichao Zhang, Hao Tang, Jiaming Liu

TL;DR

Stable-Hair introduces the first diffusion-based framework for robust real-world hairstyle transfer, tackling complex hairstyles with a two-stage pipeline that first converts the source image to a bald proxy and then transfers the target hairstyle via Hair Extractor and Latent IdentityNet. A Latent ControlNet ensures color and content consistency in non-hair regions, while an automated data-generation pipeline (leveraging ChatGPT and inpainting) yields rich triplet training data. Extensive experiments show state-of-the-art fidelity, fine-grained hair details, and strong pose robustness, supported by quantitative metrics and user studies. The work advances practical virtual hair try-on while acknowledging ethical considerations and potential limitations such as accidental transfer of accessories.

Abstract

Current hair transfer methods struggle to handle diverse and intricate hairstyles, limiting their applicability in real-world scenarios. In this paper, we propose a novel diffusion-based hair transfer framework, named \textit{Stable-Hair}, which robustly transfers a wide range of real-world hairstyles to user-provided faces for virtual hair try-on. To achieve this goal, our Stable-Hair framework is designed as a two-stage pipeline. In the first stage, we train a Bald Converter alongside stable diffusion to remove hair from the user-provided face images, resulting in bald images. In the second stage, we specifically designed a Hair Extractor and a Latent IdentityNet to transfer the target hairstyle with highly detailed and high-fidelity to the bald image. The Hair Extractor is trained to encode reference images with the desired hairstyles, while the Latent IdentityNet ensures consistency in identity and background. To minimize color deviations between source images and transfer results, we introduce a novel Latent ControlNet architecture, which functions as both the Bald Converter and Latent IdentityNet. After training on our curated triplet dataset, our method accurately transfers highly detailed and high-fidelity hairstyles to the source images. Extensive experiments demonstrate that our approach achieves state-of-the-art performance compared to existing hair transfer methods. Project page: \textcolor{red}{\url{https://xiaojiu-z.github.io/Stable-Hair.github.io/}}

Stable-Hair: Real-World Hair Transfer via Diffusion Model

TL;DR

Stable-Hair introduces the first diffusion-based framework for robust real-world hairstyle transfer, tackling complex hairstyles with a two-stage pipeline that first converts the source image to a bald proxy and then transfers the target hairstyle via Hair Extractor and Latent IdentityNet. A Latent ControlNet ensures color and content consistency in non-hair regions, while an automated data-generation pipeline (leveraging ChatGPT and inpainting) yields rich triplet training data. Extensive experiments show state-of-the-art fidelity, fine-grained hair details, and strong pose robustness, supported by quantitative metrics and user studies. The work advances practical virtual hair try-on while acknowledging ethical considerations and potential limitations such as accidental transfer of accessories.

Abstract

Current hair transfer methods struggle to handle diverse and intricate hairstyles, limiting their applicability in real-world scenarios. In this paper, we propose a novel diffusion-based hair transfer framework, named \textit{Stable-Hair}, which robustly transfers a wide range of real-world hairstyles to user-provided faces for virtual hair try-on. To achieve this goal, our Stable-Hair framework is designed as a two-stage pipeline. In the first stage, we train a Bald Converter alongside stable diffusion to remove hair from the user-provided face images, resulting in bald images. In the second stage, we specifically designed a Hair Extractor and a Latent IdentityNet to transfer the target hairstyle with highly detailed and high-fidelity to the bald image. The Hair Extractor is trained to encode reference images with the desired hairstyles, while the Latent IdentityNet ensures consistency in identity and background. To minimize color deviations between source images and transfer results, we introduce a novel Latent ControlNet architecture, which functions as both the Bald Converter and Latent IdentityNet. After training on our curated triplet dataset, our method accurately transfers highly detailed and high-fidelity hairstyles to the source images. Extensive experiments demonstrate that our approach achieves state-of-the-art performance compared to existing hair transfer methods. Project page: \textcolor{red}{\url{https://xiaojiu-z.github.io/Stable-Hair.github.io/}}
Paper Structure (33 sections, 2 equations, 14 figures, 4 tables)

This paper contains 33 sections, 2 equations, 14 figures, 4 tables.

Figures (14)

  • Figure 1: Stable-Hair is the first diffusion-based method for hairstyle transfer, capable of handling an extensive range of real-world hairstyles with exceptional robustness. Unlike previous methods, which often struggle with complex or intricate styles, Stable-Hair achieves remarkably detailed and high-fidelity transfers while preserving the original identity content.
  • Figure 2: Overall schematics of our method. Our pipeline consists of two stages. First, the user's input source image is transformed into a bald proxy image by utilizing a Bald Converter. In the second stage, we employ the pre-trained SD model along with a Hair Extractor to transfer the reference hair onto the bald proxy image. The Hair Extractor is responsible for capturing the intricate details and features of the reference hair. These features are then injected into the SD model through newly added hair cross-attention layers. After training on the triplet dataset constructed using our specially designed automated data pipeline, our method achieves highly detailed and high-fidelity hair transfers, resulting in natural and visually appealing outcomes.
  • Figure 3: Synthetic Training Data: We propose an automated data generation pipeline to generate {Original image (or frame 1), Reference image, Bald proxy image} triplets for training. The pipeline uses ChatGPT to generate text prompts, the Stable Diffusion Inpainting model to generate reference images, and our pre-trained Bald converter to convert the original image or one of the frames sampled from videos into the bald proxy image.
  • Figure 4: Qualitative comparison of different methods. Compared to other approaches, our method achieves more refined and stable hairstyle transfer without the need for precise facial alignment or explicit masks for supervision.
  • Figure 5: Visual comparison of hair removal using HairMapper. Our bald converter demonstrates robust performance, effectively converting source images across a wide range of poses, half-body shots, and even animated characters. In contrast, HairMapper struggles with these diverse scenarios, failing to maintain identity and background consistency.
  • ...and 9 more figures