CycleGAN, a Master of Steganography
Casey Chu, Andrey Zhmoginov, Mark Sandler
TL;DR
The paper reveals that CycleGAN can learn mappings between unpaired image domains by encoding source-image details into a subtle high-frequency signal within the generated images, satisfying cycle-consistency while masking information. This hidden encoding makes the system particularly vulnerable to adversarial attacks, as small input perturbations can steer outputs toward chosen targets. The authors quantify the encoding as high-frequency and robust to low-frequency corruption, demonstrate targeted attacks via crafted maps, and discuss defenses such as increasing domain entropy or adding hidden channels. The work highlights risks in loss designs for multi-network systems and suggests directions to both mitigate attacks and potentially improve translation quality by preventing hidden encodings.
Abstract
CycleGAN (Zhu et al. 2017) is one recent successful approach to learn a transformation between two image distributions. In a series of experiments, we demonstrate an intriguing property of the model: CycleGAN learns to "hide" information about a source image into the images it generates in a nearly imperceptible, high-frequency signal. This trick ensures that the generator can recover the original sample and thus satisfy the cyclic consistency requirement, while the generated image remains realistic. We connect this phenomenon with adversarial attacks by viewing CycleGAN's training procedure as training a generator of adversarial examples and demonstrate that the cyclic consistency loss causes CycleGAN to be especially vulnerable to adversarial attacks.
