Table of Contents
Fetching ...

RoNet: Rotation-oriented Continuous Image Translation

Yi Li, Xin Xie, Lina Lei, Haiyan Fu, Yanqing Guo

TL;DR

RoNet tackles continuous multi-domain image translation by representing domain relationships on a learned annular manifold and rotating a style vector within an automatically discovered 2D rotation plane. It jointly learns to disentangle content and style while enforcing semantic consistency through a VSA loss and a patch-based semantic style loss, improving texture realism in challenging forests, faces, and street scenes. The method achieves superior visual quality and continuity compared with multiple baselines, with favorable LPIPS, FID, and KID scores, and supports high-resolution outputs. Overall, RoNet provides a general, end-to-end approach for smooth, cyclic domain translation with a single input image, applicable to seasonal variation, time-of-day shifts, and cross-domain photography styles.

Abstract

The generation of smooth and continuous images between domains has recently drawn much attention in image-to-image (I2I) translation. Linear relationship acts as the basic assumption in most existing approaches, while applied to different aspects including features, models or labels. However, the linear assumption is hard to conform with the element dimension increases and suffers from the limit that having to obtain both ends of the line. In this paper, we propose a novel rotation-oriented solution and model the continuous generation with an in-plane rotation over the style representation of an image, achieving a network named RoNet. A rotation module is implanted in the generation network to automatically learn the proper plane while disentangling the content and the style of an image. To encourage realistic texture, we also design a patch-based semantic style loss that learns the different styles of the similar object in different domains. We conduct experiments on forest scenes (where the complex texture makes the generation very challenging), faces, streetscapes and the iphone2dslr task. The results validate the superiority of our method in terms of visual quality and continuity.

RoNet: Rotation-oriented Continuous Image Translation

TL;DR

RoNet tackles continuous multi-domain image translation by representing domain relationships on a learned annular manifold and rotating a style vector within an automatically discovered 2D rotation plane. It jointly learns to disentangle content and style while enforcing semantic consistency through a VSA loss and a patch-based semantic style loss, improving texture realism in challenging forests, faces, and street scenes. The method achieves superior visual quality and continuity compared with multiple baselines, with favorable LPIPS, FID, and KID scores, and supports high-resolution outputs. Overall, RoNet provides a general, end-to-end approach for smooth, cyclic domain translation with a single input image, applicable to seasonal variation, time-of-day shifts, and cross-domain photography styles.

Abstract

The generation of smooth and continuous images between domains has recently drawn much attention in image-to-image (I2I) translation. Linear relationship acts as the basic assumption in most existing approaches, while applied to different aspects including features, models or labels. However, the linear assumption is hard to conform with the element dimension increases and suffers from the limit that having to obtain both ends of the line. In this paper, we propose a novel rotation-oriented solution and model the continuous generation with an in-plane rotation over the style representation of an image, achieving a network named RoNet. A rotation module is implanted in the generation network to automatically learn the proper plane while disentangling the content and the style of an image. To encourage realistic texture, we also design a patch-based semantic style loss that learns the different styles of the similar object in different domains. We conduct experiments on forest scenes (where the complex texture makes the generation very challenging), faces, streetscapes and the iphone2dslr task. The results validate the superiority of our method in terms of visual quality and continuity.
Paper Structure (36 sections, 12 equations, 18 figures, 3 tables)

This paper contains 36 sections, 12 equations, 18 figures, 3 tables.

Figures (18)

  • Figure 1: The turning wheel of four seasons generated by RoNet with the single input (on the right labeled with the red dot).
  • Figure 2: The high definition results of RoNet. Images in one row are generated with a single source image by setting different rotation angles $\theta$. More results are presented in Sec.\ref{['Experiments']}.
  • Figure 3: Visualized difference between vector rotation and interpolation.
  • Figure 4: Schematic of rotating a style vector from $\vec{S_1}$ to $\vec{S_2}$. First, $\vec{S_1}$ is mapped onto the rotation plane and obtain $\vec{P_1}+\vec{R} = \vec{S_1}$. Second, rotate $\vec{P_1}$ to $\vec{P_2}$ in the rotation plane. Finally, $\vec{S_2}=\vec{P_2}+\vec{R}$.
  • Figure 5: Overview of RoNet. The source image $I_{src}$ is disentangled into the content representation and the style representation by $E_c$ and $E_s$. Under the alternant training of style vector, the style representation of $I_{src}$ is rotated from the source domain to the target domain with the guide of $I_{tgt}$. To further encourage realistic texture, we design a patch-based semantic style loss.
  • ...and 13 more figures