Table of Contents
Fetching ...

Image-to-Image Translation: Methods and Applications

Yingxue Pang, Jianxin Lin, Tao Qin, Zhibo Chen

TL;DR

This survey comprehensively catalogs image-to-image translation techniques, organizing methods into two-domain and multi-domain settings and distinguishing supervised, unsupervised, semi-supervised, and few-shot paradigms. It analyzes core generative backbones (VAEs and GANs), details stabilization strategies, and enumerates objective and subjective evaluation metrics. The work inventories a broad array of models—from pix2pix and CycleGAN to SPADE, StarGANv2, and CoCosNet—across diverse applications, and discusses practical considerations such as data requirements, multimodal outputs, and domain adaptation. By mapping algorithmic advances to concrete tasks and datasets, the paper highlights both the progress and challenges in producing high-fidelity, diverse translations at scale. The overall contribution is a consolidated reference for researchers and practitioners to navigate I2I methods and their applications, and to identify gaps for future work.

Abstract

Image-to-image translation (I2I) aims to transfer images from a source domain to a target domain while preserving the content representations. I2I has drawn increasing attention and made tremendous progress in recent years because of its wide range of applications in many computer vision and image processing problems, such as image synthesis, segmentation, style transfer, restoration, and pose estimation. In this paper, we provide an overview of the I2I works developed in recent years. We will analyze the key techniques of the existing I2I works and clarify the main progress the community has made. Additionally, we will elaborate on the effect of I2I on the research and industry community and point out remaining challenges in related fields.

Image-to-Image Translation: Methods and Applications

TL;DR

This survey comprehensively catalogs image-to-image translation techniques, organizing methods into two-domain and multi-domain settings and distinguishing supervised, unsupervised, semi-supervised, and few-shot paradigms. It analyzes core generative backbones (VAEs and GANs), details stabilization strategies, and enumerates objective and subjective evaluation metrics. The work inventories a broad array of models—from pix2pix and CycleGAN to SPADE, StarGANv2, and CoCosNet—across diverse applications, and discusses practical considerations such as data requirements, multimodal outputs, and domain adaptation. By mapping algorithmic advances to concrete tasks and datasets, the paper highlights both the progress and challenges in producing high-fidelity, diverse translations at scale. The overall contribution is a consolidated reference for researchers and practitioners to navigate I2I methods and their applications, and to identify gaps for future work.

Abstract

Image-to-image translation (I2I) aims to transfer images from a source domain to a target domain while preserving the content representations. I2I has drawn increasing attention and made tremendous progress in recent years because of its wide range of applications in many computer vision and image processing problems, such as image synthesis, segmentation, style transfer, restoration, and pose estimation. In this paper, we provide an overview of the I2I works developed in recent years. We will analyze the key techniques of the existing I2I works and clarify the main progress the community has made. Additionally, we will elaborate on the effect of I2I on the research and industry community and point out remaining challenges in related fields.

Paper Structure

This paper contains 38 sections, 14 equations, 21 figures, 5 tables.

Figures (21)

  • Figure 1: An example of image-to-image translation (I2I) for illustration. (Left): How to make your selfie more artistic as drawings from cartoonists? This type of research work can be broadly deemed the I2I problem. (Right): You can take a selfie as a source image and a cartoon as a target reference to "translate" it into desired artistic style image.
  • Figure 2: An overview of image-to-image translation methods. This figure shows the relationship between different methods and where they intersect with each other.
  • Figure 3: The structure of a VAE
  • Figure 4: The structure of unconditional GANs, where $z, G$ and $D$ denote the random noise, generator, and discriminator, respectively.
  • Figure 5: The structure of conditional GANs, where $z, G$ and $D$ denote the random noise, generator, and discriminator, respectively. Conditional GANs usually add additional information $y$ (such as data labels, text or attributes of images) to the generator and discriminator to generate desirable results.
  • ...and 16 more figures