HairWeaver: Few-Shot Photorealistic Hair Motion Synthesis with Sim-to-Real Guided Video Diffusion

Di Chang; Ji Hou; Aljaz Bozic; Assaf Neuberger; Felix Juefei-Xu; Olivier Maury; Gene Wei-Chin Lin; Tuur Stuyck; Doug Roble; Mohammad Soleymani; Stephane Grabli

HairWeaver: Few-Shot Photorealistic Hair Motion Synthesis with Sim-to-Real Guided Video Diffusion

Di Chang, Ji Hou, Aljaz Bozic, Assaf Neuberger, Felix Juefei-Xu, Olivier Maury, Gene Wei-Chin Lin, Tuur Stuyck, Doug Roble, Mohammad Soleymani, Stephane Grabli

TL;DR

HairWeaver tackles the challenge of realistic hair dynamics in single-image video animation by introducing two domain-adaptive LoRA adapters that inject hair motion and bridge CG-to-photorealism. It trains on a CG-based dynamic-hair dataset and employs a two-stage workflow to adapt a diffusion backbone to the CG domain while discarding the domain adapter at inference, yielding photorealistic results with controllable hair motion. The approach achieves state-of-the-art quantitative metrics and strong user-study preferences on both CG and NeRSemble benchmarks. This work enables few-shot, photorealistic hair motion synthesis with potential applications in VFX, games, and VR.

Abstract

We present HairWeaver, a diffusion-based pipeline that animates a single human image with realistic and expressive hair dynamics. While existing methods successfully control body pose, they lack specific control over hair, and as a result, fail to capture the intricate hair motions, resulting in stiff and unrealistic animations. HairWeaver overcomes this limitation using two specialized modules: a Motion-Context-LoRA to integrate motion conditions and a Sim2Real-Domain-LoRA to preserve the subject's photoreal appearance across different data domains. These lightweight components are designed to guide a video diffusion backbone while maintaining its core generative capabilities. By training on a specialized dataset of dynamic human motion generated from a CG simulator, HairWeaver affords fine control over hair motion and ultimately learns to produce highly realistic hair that responds naturally to movement. Comprehensive evaluations demonstrate that our approach sets a new state of the art, producing lifelike human hair animations with dynamic details.

HairWeaver: Few-Shot Photorealistic Hair Motion Synthesis with Sim-to-Real Guided Video Diffusion

TL;DR

Abstract

Paper Structure (20 sections, 9 equations, 5 figures, 7 tables)

This paper contains 20 sections, 9 equations, 5 figures, 7 tables.

Introduction
Related Work
Diffusion Models for Human Video Animation
Hair Synthesis
Method
Preliminaries
Synthetic Hair Motion Generation
Motion-Context-LoRA
Sim2Real-Domain-LoRA and Training
Experiments
Implementation Details
Evaluations and Comparisons
Limitations and Future Works
Conclusion
Ethics Statement
...and 5 more sections

Figures (5)

Figure 1: Photorealistic hair motions generated by HairWeaver.
Figure 2: Overview of HairWeaver pipeline. a) We use CG simulation to generate data including human videos with motions $\mathbf{V}_{gt}$, static reference image $\mathbf{I}_{ref}$ (a frame from $\mathbf{V}_{gt}$), pose condition $\mathbf{C}_{pose}$, and hair condition $\mathbf{C}_{hair}$. b) During training stage 1, we leverage the a diffusion transformer peebles2023scalable (DiT) as the backbone model and pre-train the Sim2Real-Domain-LoRA. This training process is conducted in Image-to-Video manner with $\mathbf{I}_{ref}$ and text prompt for $\mathbf{V}_{gt}$. c) During training stage 2, we freeze the Sim2Real-Domain-LoRA and finetune the Motion-Context-LoRA with $\mathbf{C}_{pose}$, and hair condition $\mathbf{C}_{hair}$ as additional guidance. d) During inference, the Sim2Real-Domain-LoRA is discarded and the trained model generates photorealistic human videos with hair and body motions with photorealistic reference and CG conditions $\mathbf{C}_{pose}, \mathbf{C}_{hair}$ as input. e) Details of the model architecture presented in (c). The Pose Encoder integrates the body motions as a trainable residual to the noisy latent. The hair motions are encoded as additional attention context to the DiT blocks by a frozen VAE-Encoder. The only trainable modules are the Pose Encoder and the Motion-Context-LoRA.
Figure 3: Visualization of comparison between HairWeaver and the previous state-of-the-art baselines wang2025unianimatewan2025. Our model generates more realistic and diverse hair motions.
Figure 4: Photorealistic motions generated by HairWeaver. The reference images are photorealistic human subjects generated by Flux flux2024.
Figure 5: Ablation analysis of Sim2Real-Domain-LoRA. We visualize the generation without (w/o) and with (w/) Sim2Real-Domain-LoRA. The one w/o such a module cannot preserve reference's appearance when it's a photorealistic image.

HairWeaver: Few-Shot Photorealistic Hair Motion Synthesis with Sim-to-Real Guided Video Diffusion

TL;DR

Abstract

HairWeaver: Few-Shot Photorealistic Hair Motion Synthesis with Sim-to-Real Guided Video Diffusion

Authors

TL;DR

Abstract

Table of Contents

Figures (5)