Table of Contents
Fetching ...

DiffStyle360: Diffusion-Based 360° Head Stylization via Style Fusion Attention

Furkan Guzelant, Arda Goktogan, Tarık Kaya, Aysegul Dundar

TL;DR

DiffStyle360 presents a diffusion-based solution for 360° head stylization that preserves identity and ensures cross-view consistency from a single style reference without retraining for each style. It introduces a Style Appearance Module for disentangled style transfer and a Style Fusion Attention mechanism to balance structure and style in the latent space, combined with GAN-derived multiview fine-tuning and a temperature-based key scaling for controllable stylization. The method achieves superior style fidelity, view coherence, and depth-consistency on FFHQ and RenderMe360 benchmarks, with strong user preference in live studies. This work advances 3D head stylization by leveraging diffusion priors and adaptive style integration, enabling arbitrary style transfer with efficient adaptation.

Abstract

3D head stylization has emerged as a key technique for reimagining realistic human heads in various artistic forms, enabling expressive character design and creative visual experiences in digital media. Despite the progress in 3D-aware generation, existing 3D head stylization methods often rely on computationally expensive optimization or domain-specific fine-tuning to adapt to new styles. To address these limitations, we propose DiffStyle360, a diffusion-based framework capable of producing multi-view consistent, identity-preserving 3D head stylizations across diverse artistic domains given a single style reference image, without requiring per-style training. Building upon the 3D-aware DiffPortrait360 architecture, our approach introduces two key components: the Style Appearance Module, which disentangles style from content, and the Style Fusion Attention mechanism, which adaptively balances structure preservation and stylization fidelity in the latent space. Furthermore, we employ a 3D GAN-generated multi-view dataset for robust fine-tuning and introduce a temperaturebased key scaling strategy to control stylization intensity during inference. Extensive experiments on FFHQ and RenderMe360 demonstrate that DiffStyle360 achieves superior style quality, outperforming state-of-the-art GAN- and diffusion-based stylization methods across challenging style domains.

DiffStyle360: Diffusion-Based 360° Head Stylization via Style Fusion Attention

TL;DR

DiffStyle360 presents a diffusion-based solution for 360° head stylization that preserves identity and ensures cross-view consistency from a single style reference without retraining for each style. It introduces a Style Appearance Module for disentangled style transfer and a Style Fusion Attention mechanism to balance structure and style in the latent space, combined with GAN-derived multiview fine-tuning and a temperature-based key scaling for controllable stylization. The method achieves superior style fidelity, view coherence, and depth-consistency on FFHQ and RenderMe360 benchmarks, with strong user preference in live studies. This work advances 3D head stylization by leveraging diffusion priors and adaptive style integration, enabling arbitrary style transfer with efficient adaptation.

Abstract

3D head stylization has emerged as a key technique for reimagining realistic human heads in various artistic forms, enabling expressive character design and creative visual experiences in digital media. Despite the progress in 3D-aware generation, existing 3D head stylization methods often rely on computationally expensive optimization or domain-specific fine-tuning to adapt to new styles. To address these limitations, we propose DiffStyle360, a diffusion-based framework capable of producing multi-view consistent, identity-preserving 3D head stylizations across diverse artistic domains given a single style reference image, without requiring per-style training. Building upon the 3D-aware DiffPortrait360 architecture, our approach introduces two key components: the Style Appearance Module, which disentangles style from content, and the Style Fusion Attention mechanism, which adaptively balances structure preservation and stylization fidelity in the latent space. Furthermore, we employ a 3D GAN-generated multi-view dataset for robust fine-tuning and introduce a temperaturebased key scaling strategy to control stylization intensity during inference. Extensive experiments on FFHQ and RenderMe360 demonstrate that DiffStyle360 achieves superior style quality, outperforming state-of-the-art GAN- and diffusion-based stylization methods across challenging style domains.

Paper Structure

This paper contains 20 sections, 6 equations, 16 figures, 4 tables.

Figures (16)

  • Figure 1: Multi-view 3D head stylization by DiffStyle360. Our method generates identity-preserving and multi-view consistent stylizations across diverse artistic styles, including cartoon-like and realistic domains. DiffStyle360 successfully preserves fine attributes such as accessories, facial expressions, and head geometry, while achieving high style fidelity without requiring per-style retraining.
  • Figure 2: An overview of our proposed framework. (a) Architecture Overview: The model takes a content image $I_{\text{content}}$, a style reference $I_{\text{style}}$, and a target camera pose $I_{\text{cam}}$ as input. We employ separate, disentangled Content and Style Appearance Modules to extract identity and style features, respectively. These features, along with pose information from a ControlNet, are fed into the main U-Net, where our novel Style Fusion Attention layers merge them. A trainable View Consistency Module ensures coherence across multiple views. (b) Style Fusion Attention Mechanism: The latent features ($\mathbf{F}_l$) from the main U-Net act as queries. To achieve balanced stylization, content keys ($\mathbf{K}_c$) are modulated by style keys ($\mathbf{K}_s$) using AdaIN. The queries then attend to a combined set of keys and values from the latent, content, and style features to produce an output that preserves identity while incorporating the desired style.
  • Figure 3: Qualitative results demonstrating the effect of key scaling on stylization.
  • Figure 4: Qualitative stylization results of 3D head stylization methods, provided in 360-degree views.
  • Figure 5: Qualitative stylization results. Top: Generalization to unseen artistic styles (Fauvism and Art Deco). Bottom: Style variations within the Anime domain.
  • ...and 11 more figures