Table of Contents
Fetching ...

Spectral Normalization and Dual Contrastive Regularization for Image-to-Image Translation

Chen Zhao, Wei-Ling Cai, Zheng Yuan

TL;DR

A new unpaired I2I translation framework based on dual contrastive regularization and spectral normalization, namely SN-DCR is proposed, and the results prove that the method achieves SOTA in multiple tasks.

Abstract

Existing image-to-image (I2I) translation methods achieve state-of-the-art performance by incorporating the patch-wise contrastive learning into Generative Adversarial Networks. However, patch-wise contrastive learning only focuses on the local content similarity but neglects the global structure constraint, which affects the quality of the generated images. In this paper, we propose a new unpaired I2I translation framework based on dual contrastive regularization and spectral normalization, namely SN-DCR. To maintain consistency of the global structure and texture, we design the dual contrastive regularization using different deep feature spaces respectively. In order to improve the global structure information of the generated images, we formulate a semantic contrastive loss to make the global semantic structure of the generated images similar to the real images from the target domain in the semantic feature space. We use Gram Matrices to extract the style of texture from images. Similarly, we design a style contrastive loss to improve the global texture information of the generated images. Moreover, to enhance the stability of the model, we employ the spectral normalized convolutional network in the design of our generator. We conduct comprehensive experiments to evaluate the effectiveness of SN-DCR, and the results prove that our method achieves SOTA in multiple tasks.

Spectral Normalization and Dual Contrastive Regularization for Image-to-Image Translation

TL;DR

A new unpaired I2I translation framework based on dual contrastive regularization and spectral normalization, namely SN-DCR is proposed, and the results prove that the method achieves SOTA in multiple tasks.

Abstract

Existing image-to-image (I2I) translation methods achieve state-of-the-art performance by incorporating the patch-wise contrastive learning into Generative Adversarial Networks. However, patch-wise contrastive learning only focuses on the local content similarity but neglects the global structure constraint, which affects the quality of the generated images. In this paper, we propose a new unpaired I2I translation framework based on dual contrastive regularization and spectral normalization, namely SN-DCR. To maintain consistency of the global structure and texture, we design the dual contrastive regularization using different deep feature spaces respectively. In order to improve the global structure information of the generated images, we formulate a semantic contrastive loss to make the global semantic structure of the generated images similar to the real images from the target domain in the semantic feature space. We use Gram Matrices to extract the style of texture from images. Similarly, we design a style contrastive loss to improve the global texture information of the generated images. Moreover, to enhance the stability of the model, we employ the spectral normalized convolutional network in the design of our generator. We conduct comprehensive experiments to evaluate the effectiveness of SN-DCR, and the results prove that our method achieves SOTA in multiple tasks.
Paper Structure (14 sections, 9 equations, 7 figures, 5 tables)

This paper contains 14 sections, 9 equations, 7 figures, 5 tables.

Figures (7)

  • Figure 1: Visual comparison results with all baselines on the Van Gogh$\rightarrow$Photo dataset. It is obvious that the global structure of the images generated by previous methods is corrupted, while our SN-DCR is able to preserve the global structure information and generate the photos with more natural details. Our SN-DCR performs better in terms of global structure and texture. Note that CUT and Cyclegan fail to generate a valid output at the first input.
  • Figure 2: Overall framework of our proposed SN-DCR. A cat (the input image) is translated by the generator G into a dog(the generated image). We introduce a dual contrastive regularization that combines both semantic and style contrastive loss to effectively pull the generated image closer to the real images of the target domain (positives) while pushing it away from the real images of the source domain (negatives).
  • Figure 3: Our proposed spectral normalized generator, denoted as G, incorporates the use of InsNorm (IN) and a Frequency Channel Attention Network (FCA). Additionally, it utilizes our novel spectral normalized residual block (SN ResBlock), with nine such blocks present in the middle of the architecture. In line with the configuration of CUT, we extract features from five different layers to compute the multi-layer patch-wise contrastive loss. These layers include RGB pixels, the initial two downsampling convolutions, and the first and fifth residual blocks.
  • Figure 4: SN Residual block. The SN residual block can enhance the stability of training and assist the generator to extract more complex features.
  • Figure 5: Visual results comparison with all baselines on the Horse $\rightarrow$ Zebra and Cat$\rightarrow$ Dog datasets. Compared with all baselines, our SN-DCR not only performs better in terms of global structure and texture, but also generates more real images with more natural detail.
  • ...and 2 more figures