Table of Contents
Fetching ...

RefineStyle: Dynamic Convolution Refinement for StyleGAN

Siwei Xia, Xueqi Hu, Li Sun, Qingli Li

TL;DR

RefineStyle tackles out-of-domain synthesis with a pre-trained StyleGAN2 by refining dynamic kernels through low-rank residuals. It introduces two token sets per layer to form a residual $ΔW_n = P_n^T ⊗ Q_n$, refined by learnable scalings, producing a refined kernel $W_n^d = ΔW_n ⊕ (s_n ⊙ W_n^0)$. The method can be applied to image inversion (one-stage or two-stage with grouped transformer blocks) and domain adaptation (text- or image-guided via CLIP) and shows improved inversion quality, faithful out-of-domain editing, and competitive domain transfer with a lightweight parameter budget. Experiments on FFHQ and LSUN Cars demonstrate lower distortions and higher-quality synthesis compared to baselines, indicating practical viability for efficient, controllable style refinement in generative models.

Abstract

In StyleGAN, convolution kernels are shaped by both static parameters shared across images and dynamic modulation factors $w^+\in\mathcal{W}^+$ specific to each image. Therefore, $\mathcal{W}^+$ space is often used for image inversion and editing. However, pre-trained model struggles with synthesizing out-of-domain images due to the limited capabilities of $\mathcal{W}^+$ and its resultant kernels, necessitating full fine-tuning or adaptation through a complex hypernetwork. This paper proposes an efficient refining strategy for dynamic kernels. The key idea is to modify kernels by low-rank residuals, learned from input image or domain guidance. These residuals are generated by matrix multiplication between two sets of tokens with the same number, which controls the complexity. We validate the refining scheme in image inversion and domain adaptation. In the former task, we design grouped transformer blocks to learn these token sets by one- or two-stage training. In the latter task, token sets are directly optimized to support synthesis in the target domain while preserving original content. Extensive experiments show that our method achieves low distortions for image inversion and high quality for out-of-domain editing.

RefineStyle: Dynamic Convolution Refinement for StyleGAN

TL;DR

RefineStyle tackles out-of-domain synthesis with a pre-trained StyleGAN2 by refining dynamic kernels through low-rank residuals. It introduces two token sets per layer to form a residual , refined by learnable scalings, producing a refined kernel . The method can be applied to image inversion (one-stage or two-stage with grouped transformer blocks) and domain adaptation (text- or image-guided via CLIP) and shows improved inversion quality, faithful out-of-domain editing, and competitive domain transfer with a lightweight parameter budget. Experiments on FFHQ and LSUN Cars demonstrate lower distortions and higher-quality synthesis compared to baselines, indicating practical viability for efficient, controllable style refinement in generative models.

Abstract

In StyleGAN, convolution kernels are shaped by both static parameters shared across images and dynamic modulation factors specific to each image. Therefore, space is often used for image inversion and editing. However, pre-trained model struggles with synthesizing out-of-domain images due to the limited capabilities of and its resultant kernels, necessitating full fine-tuning or adaptation through a complex hypernetwork. This paper proposes an efficient refining strategy for dynamic kernels. The key idea is to modify kernels by low-rank residuals, learned from input image or domain guidance. These residuals are generated by matrix multiplication between two sets of tokens with the same number, which controls the complexity. We validate the refining scheme in image inversion and domain adaptation. In the former task, we design grouped transformer blocks to learn these token sets by one- or two-stage training. In the latter task, token sets are directly optimized to support synthesis in the target domain while preserving original content. Extensive experiments show that our method achieves low distortions for image inversion and high quality for out-of-domain editing.
Paper Structure (23 sections, 7 equations, 8 figures, 5 tables)

This paper contains 23 sections, 7 equations, 8 figures, 5 tables.

Figures (8)

  • Figure 1: Simulation results for RefineStyle.
  • Figure 2: (a) Modulated convolution in the n-th layer of StyleGAN2. (b) The proposed refined modulated convolution.
  • Figure 3: Sorted eigen-values of dynamic kernels. For each layer, we perform SVD on kernel weight $W_n$, and list all eigen-values in the descending order. Notice that the first few values constitute most of the total, indicating that kernels are low-rank.
  • Figure 4: The RefineStyle applied in image inversion. Given a real image $I$, the model specifies $w^+$ code to form dynamic convolution kernels, and tokens $P$ and $Q$ to refine them, as shown on the right. Two training strategies are distinguished with different background color on the left. In one-stage training(green), initial $\Tilde{w^+}$, $\Tilde{P}$ and $\Tilde{Q}$ are updated with real image features encoded by $E$. In two-stage training(grey), a pre-trained inverter gives $w_0^+$ and initial inversion $\hat{I}_0$. The model takes the concatenated $[I,\hat{I}_0]$ as input to update $\Tilde{P}$ and $\Tilde{Q}$, and fix $w_0^+$ for modulation.
  • Figure 5: Qualitative comparison of image inversion. Ours(1) and Ours(2) denote the one- and two-stage models. RefineStyle excels in reconstructing details, such as mouth in the 1-st row, hair in the 2-nd row and background in the 4-th row.
  • ...and 3 more figures