Table of Contents
Fetching ...

Diversity-aware Channel Pruning for StyleGAN Compression

Jiwoo Chung, Sangeek Hyun, Sang-Heon Shim, Jae-Pil Heo

TL;DR

A novel channel pruning method that leverages varying sensitivities of channels to latent vectors, which is a key factor in sample diversity, by assessing channel importance based on their sensitivities to latent vector perturbations is proposed.

Abstract

StyleGAN has shown remarkable performance in unconditional image generation. However, its high computational cost poses a significant challenge for practical applications. Although recent efforts have been made to compress StyleGAN while preserving its performance, existing compressed models still lag behind the original model, particularly in terms of sample diversity. To overcome this, we propose a novel channel pruning method that leverages varying sensitivities of channels to latent vectors, which is a key factor in sample diversity. Specifically, by assessing channel importance based on their sensitivities to latent vector perturbations, our method enhances the diversity of samples in the compressed model. Since our method solely focuses on the channel pruning stage, it has complementary benefits with prior training schemes without additional training cost. Extensive experiments demonstrate that our method significantly enhances sample diversity across various datasets. Moreover, in terms of FID scores, our method not only surpasses state-of-the-art by a large margin but also achieves comparable scores with only half training iterations.

Diversity-aware Channel Pruning for StyleGAN Compression

TL;DR

A novel channel pruning method that leverages varying sensitivities of channels to latent vectors, which is a key factor in sample diversity, by assessing channel importance based on their sensitivities to latent vector perturbations is proposed.

Abstract

StyleGAN has shown remarkable performance in unconditional image generation. However, its high computational cost poses a significant challenge for practical applications. Although recent efforts have been made to compress StyleGAN while preserving its performance, existing compressed models still lag behind the original model, particularly in terms of sample diversity. To overcome this, we propose a novel channel pruning method that leverages varying sensitivities of channels to latent vectors, which is a key factor in sample diversity. Specifically, by assessing channel importance based on their sensitivities to latent vector perturbations, our method enhances the diversity of samples in the compressed model. Since our method solely focuses on the channel pruning stage, it has complementary benefits with prior training schemes without additional training cost. Extensive experiments demonstrate that our method significantly enhances sample diversity across various datasets. Moreover, in terms of FID scores, our method not only surpasses state-of-the-art by a large margin but also achieves comparable scores with only half training iterations.
Paper Structure (16 sections, 5 equations, 12 figures, 9 tables)

This paper contains 16 sections, 5 equations, 12 figures, 9 tables.

Figures (12)

  • Figure 1: (a) Intuitive illustration of our method. We compare four channels (Ch. #1, 2, 3, 4) by evaluating their responses when we pass the same latent vector $w$ and its perturbed counterpart $(w+\alpha d)$. By investigating the contribution of each channel to resulting image difference, we determine the sensitivity of channels to the latent perturbation. In this example, Ch. #4 is highly sensitive to the perturbation, while Ch. #1, 2, 3 exhibit low sensitivity. Consequently, in terms of preserving sample diversity, Ch. #4 is suitable for retaining. (b) Our overall framework. We aim to assess the contribution of each channel to the sample diversity by measuring its sensitivity to latent vector perturbation. In detail, 1) we sample a directional vector for the perturbation, 2) we compute the image-level difference caused by the latent vector perturbation, and 3) we calculate channel-wise gradient magnitudes induced by the difference image. The channel-wise sensitivity to the sample diversity is determined by its gradient magnitudes. As a result, we can estimate the channel-wise sensitivity against diversity.
  • Figure 2: Overall compression framework. Overall compression framework consists of two stages: channel pruning and distillation. Channel-pruning stage initializes a compact student model by pruning channels of a larger teacher model. Specifically for the StyleGAN2 architecture, pruning process usually focuses on reducing channels in the synthesis network, while retraining the mapping network. Distillation stage further trains student model with several training objectives such as adversarial and distillation losses. In this paper, we focus on the channel-pruning stage.
  • Figure 3: (a) Precision-Recall Curve. By adjusting the truncation trick parameter $\psi$ within the range [0.5, 1.0] with step size 0.1, we visualize the Precision-Recall curve of the proposed method and baselines. We validate that ours surpasses baseline methods with every range of precision and recall. (b) FID w.r.t Training Iterations. We visualize FID curve during training and validate that the proposed method achieves the previous state-of-the-art FIDs only with 2$\times$ fewer iterations.
  • Figure 4: Qualitative comparison with baselines on various datasets. For qualitative comparison, we visualize our generated samples and baselines in FFHQ-256, LSUN Church-256, and LSUN Horse-256 datasets. Each column corresponds to samples generated from the same noise vector $z$. Averaged L1 distances between 10K samples from teacher and student are reported below each method. The lowest distortion of the proposed method validates that ours has enhanced capability to preserve the diversity in the image space.
  • Figure 5: Specific examples for pruned channels (a) We provide a scatter plot of $S^{\mu}$ and $S^{\sigma}$ scores of all channels in 6${^\text{th}}$ layer of teacher generator trained on FFHQ-256. Brown and green dots represent channels that are always pruned and not pruned, respectively. Blue and Red dots indicate channels that only survive in $S^{\mu}$ and $S^{\sigma}$, respectively. (b) The 350${^\text{th}}$ channel exhibits high sensitivity to the 14${^\text{th}}$ direction from PCA. (c) The 14${^\text{th}}$ direction corresponds to an age-related perturbation. The $S^{\mu}$ score prunes the 350${^\text{th}}$ channel, while the $S^{\sigma}$ score preserves this channel, which demonstrates high sensitivity to age variation. This result shows that $S^{\sigma}$ aims to retain semantic image changes compared to $S^{\mu}$.
  • ...and 7 more figures