Table of Contents
Fetching ...

SAT3D: Image-driven Semantic Attribute Transfer in 3D

Zhijun Zhai, Zengmao Wang, Xiaoxiao Long, Kaixuan Zhou, Bo Du

TL;DR

SAT3D addresses the challenge of image-driven semantic attribute transfer in 3D-aware generative models by learning correlations between semantic attributes and style-space channels through a Meta Attribute Mask Matrix and descriptor groups, and by guiding edits with a CLIP-based Quantitative Measurement Module. It introduces target-transfer and irrelevant-preservation losses to steer the migration of attributes from a reference image while suppressing unintended changes, enabling 3D-consistent attribute edits without fine-tuning. The approach demonstrates 3D-aware transfers across multiple domains and also adapts to 2D generators, highlighting its flexibility and customizable control via reference images. Overall, SAT3D delivers precise, interpretable, and photorealistic attribute editing that outperforms traditional region-based or purely semantic-guided methods while maintaining multi-view consistency.

Abstract

GAN-based image editing task aims at manipulating image attributes in the latent space of generative models. Most of the previous 2D and 3D-aware approaches mainly focus on editing attributes in images with ambiguous semantics or regions from a reference image, which fail to achieve photographic semantic attribute transfer, such as the beard from a photo of a man. In this paper, we propose an image-driven Semantic Attribute Transfer method in 3D (SAT3D) by editing semantic attributes from a reference image. For the proposed method, the exploration is conducted in the style space of a pre-trained 3D-aware StyleGAN-based generator by learning the correlations between semantic attributes and style code channels. For guidance, we associate each attribute with a set of phrase-based descriptor groups, and develop a Quantitative Measurement Module (QMM) to quantitatively describe the attribute characteristics in images based on descriptor groups, which leverages the image-text comprehension capability of CLIP. During the training process, the QMM is incorporated into attribute losses to calculate attribute similarity between images, guiding target semantic transferring and irrelevant semantics preserving. We present our 3D-aware attribute transfer results across multiple domains and also conduct comparisons with classical 2D image editing methods, demonstrating the effectiveness and customizability of our SAT3D.

SAT3D: Image-driven Semantic Attribute Transfer in 3D

TL;DR

SAT3D addresses the challenge of image-driven semantic attribute transfer in 3D-aware generative models by learning correlations between semantic attributes and style-space channels through a Meta Attribute Mask Matrix and descriptor groups, and by guiding edits with a CLIP-based Quantitative Measurement Module. It introduces target-transfer and irrelevant-preservation losses to steer the migration of attributes from a reference image while suppressing unintended changes, enabling 3D-consistent attribute edits without fine-tuning. The approach demonstrates 3D-aware transfers across multiple domains and also adapts to 2D generators, highlighting its flexibility and customizable control via reference images. Overall, SAT3D delivers precise, interpretable, and photorealistic attribute editing that outperforms traditional region-based or purely semantic-guided methods while maintaining multi-view consistency.

Abstract

GAN-based image editing task aims at manipulating image attributes in the latent space of generative models. Most of the previous 2D and 3D-aware approaches mainly focus on editing attributes in images with ambiguous semantics or regions from a reference image, which fail to achieve photographic semantic attribute transfer, such as the beard from a photo of a man. In this paper, we propose an image-driven Semantic Attribute Transfer method in 3D (SAT3D) by editing semantic attributes from a reference image. For the proposed method, the exploration is conducted in the style space of a pre-trained 3D-aware StyleGAN-based generator by learning the correlations between semantic attributes and style code channels. For guidance, we associate each attribute with a set of phrase-based descriptor groups, and develop a Quantitative Measurement Module (QMM) to quantitatively describe the attribute characteristics in images based on descriptor groups, which leverages the image-text comprehension capability of CLIP. During the training process, the QMM is incorporated into attribute losses to calculate attribute similarity between images, guiding target semantic transferring and irrelevant semantics preserving. We present our 3D-aware attribute transfer results across multiple domains and also conduct comparisons with classical 2D image editing methods, demonstrating the effectiveness and customizability of our SAT3D.
Paper Structure (11 sections, 7 equations, 11 figures)

This paper contains 11 sections, 7 equations, 11 figures.

Figures (11)

  • Figure 1: Comparison of different image editing tasks. Existing methods focus on editing with ambiguous semantics or regions from images. Generally, the semantic attributes are described by texts or classifiers, suffering from ambiguity. Illustrating specific characteristics of attributes with reference images can clarify descriptions. The existing image-driven methods are based on region-wide replacement, which are unable to migrate semantic attributes, such as beards. Instead, our proposed SAT3D is an image-driven semantic-based method, enabling the editing of semantic attributes according to the details of reference images.
  • Figure 2: The attribute transfer pipeline of SAT3D. Based on pre-trained 2D or 3D-aware generators, SAT3D learns a meta attribute mask matrix to explore correlations between semantic attributes and style code channels of style space $\mathcal{S}$. For training guidance, we define a set of descriptor groups $\Omega$ for each attribute and develop a Quantitative Measurement Module (QMM) to measure the attribute characteristics in images, utilizing the zero-shot prediction capability of CLIP. With QMM, the attribute losses are designed for target attribute transfer and irrelevant attribute preservation. For this example, the target attribute set $\Omega=\{Hairstyle\}$ and the editing intensity along editing direction $\delta=1$. A lock indicates the module parameters being frozen.
  • Figure 3: Visual comparison of 3D-aware attribute transfer. Notably, SAT3D has the capability of customizing attributes based on reference images in addition to competitive editing results.
  • Figure 4: Visualization of attribute transfer on the EG3D generator pre-trained on FFHQ dataset.
  • Figure 5: Extra visualization of attribute transfer on the EG3D generator pre-trained on AFHQv2 Cats $512^2$ and ShapeNet Car $128^2$ respectively.
  • ...and 6 more figures