Table of Contents
Fetching ...

Unpaired Image-to-Image Translation with Content Preserving Perspective: A Review

Mehran Safayani, Behnaz Mirzapour, Hanieh aghaebrahimiyan, Nasrin Salehi, Hamid Ravaee

TL;DR

This survey examines unpaired image-to-image translation with a focus on content preservation, organizing methods by architecture (GAN-, VAE-, diffusion-, flow-, and transformer-based). It highlights three content-preservation categories (FCP, PCP, NCP) and surveys hundreds of models, datasets, and metrics, including a novel Sim2Real benchmark. Key contributions include a structured taxonomy of methods, a catalog of datasets and evaluation criteria, and empirical benchmarks illustrating how preserving content varies across tasks. The work guides practitioners in choosing appropriate I2I models for specific applications and emphasizes content preservation as a central criterion for method selection and evaluation.

Abstract

Image-to-image translation (I2I) transforms an image from a source domain to a target domain while preserving source content. Most computer vision applications are in the field of image-to-image translation, such as style transfer, image segmentation, and photo enhancement. The degree of preservation of the content of the source images in the translation process can be different according to the problem and the intended application. From this point of view, in this paper, we divide the different tasks in the field of image-to-image translation into three categories: Fully Content preserving, Partially Content preserving, and Non-Content preserving. We present different tasks, datasets, methods, results of methods for these three categories in this paper. We make a categorization for I2I methods based on the architecture of different models and study each category separately. In addition, we introduce well-known evaluation criteria in the I2I translation field. Specifically, nearly 70 different I2I models were analyzed, and more than 10 quantitative evaluation metrics and 30 distinct tasks and datasets relevant to the I2I translation problem were both introduced and assessed. Translating from simulation to real images could be well viewed as an application of fully content preserving or partially content preserving unsupervised image-to-image translation methods. So, we provide a benchmark for Sim-to-Real translation, which can be used to evaluate different methods. In general, we conclude that because of the different extent of the obligation to preserving content in various applications, it is better to consider this issue in choosing a suitable I2I model for a specific application.

Unpaired Image-to-Image Translation with Content Preserving Perspective: A Review

TL;DR

This survey examines unpaired image-to-image translation with a focus on content preservation, organizing methods by architecture (GAN-, VAE-, diffusion-, flow-, and transformer-based). It highlights three content-preservation categories (FCP, PCP, NCP) and surveys hundreds of models, datasets, and metrics, including a novel Sim2Real benchmark. Key contributions include a structured taxonomy of methods, a catalog of datasets and evaluation criteria, and empirical benchmarks illustrating how preserving content varies across tasks. The work guides practitioners in choosing appropriate I2I models for specific applications and emphasizes content preservation as a central criterion for method selection and evaluation.

Abstract

Image-to-image translation (I2I) transforms an image from a source domain to a target domain while preserving source content. Most computer vision applications are in the field of image-to-image translation, such as style transfer, image segmentation, and photo enhancement. The degree of preservation of the content of the source images in the translation process can be different according to the problem and the intended application. From this point of view, in this paper, we divide the different tasks in the field of image-to-image translation into three categories: Fully Content preserving, Partially Content preserving, and Non-Content preserving. We present different tasks, datasets, methods, results of methods for these three categories in this paper. We make a categorization for I2I methods based on the architecture of different models and study each category separately. In addition, we introduce well-known evaluation criteria in the I2I translation field. Specifically, nearly 70 different I2I models were analyzed, and more than 10 quantitative evaluation metrics and 30 distinct tasks and datasets relevant to the I2I translation problem were both introduced and assessed. Translating from simulation to real images could be well viewed as an application of fully content preserving or partially content preserving unsupervised image-to-image translation methods. So, we provide a benchmark for Sim-to-Real translation, which can be used to evaluate different methods. In general, we conclude that because of the different extent of the obligation to preserving content in various applications, it is better to consider this issue in choosing a suitable I2I model for a specific application.

Paper Structure

This paper contains 43 sections, 15 equations, 18 figures, 14 tables.

Figures (18)

  • Figure 1: Several image-to-image translation problems.
  • Figure 2: Visual comparisons of results of different models (CycleGAN zhu2017unpaired, MUNIT huang2018multimodal, GcGAN fu2019geometry, NICE-GAN chen2020reusing, and IrwGAN xie2021unaligned) in preserving image content xie2021unaligned. Tasks in order of rows: Lion2Tiger, and Dog2Cat.
  • Figure 3: An overview of image-to-image translation methods.
  • Figure 4: GAN-based image-to-image translation methods
  • Figure 5: Conditional Generative Adversarial Network
  • ...and 13 more figures