Table of Contents
Fetching ...

StegoGAN: Leveraging Steganography for Non-Bijective Image-to-Image Translation

Sidi Wu, Yizi Chen, Samuel Mermet, Lorenz Hurni, Konrad Schindler, Nicolas Gonthier, Loic Landrieu

TL;DR

StegoGAN tackles non-bijective image-to-image translation by introducing steganography in feature space to prevent hallucination of unmatchable classes. The method explicitly disentangles matchable and unmatchable information through a backward encoder–decoder and a learnable unmatchability mask, with forward translations augmented by unmatchable content only as needed. It adds targeted losses and a mask regularization to ensure sparse, interpretable masks and to enforce matchable-consistency across translations, all without paired supervision. Empirical results on PlanIGN, GoogleMaps, and Brats MRI show improved semantic fidelity and reduced false positives of unmatchable content compared with CycleGAN and other baselines, with ablations highlighting the pivotal role of L_reg and the mask mechanism. This approach advances reliable domain translation in real-world, asymmetric settings and opens avenues for applying steganography-inspired safeguards to other generative translation tasks.

Abstract

Most image-to-image translation models postulate that a unique correspondence exists between the semantic classes of the source and target domains. However, this assumption does not always hold in real-world scenarios due to divergent distributions, different class sets, and asymmetrical information representation. As conventional GANs attempt to generate images that match the distribution of the target domain, they may hallucinate spurious instances of classes absent from the source domain, thereby diminishing the usefulness and reliability of translated images. CycleGAN-based methods are also known to hide the mismatched information in the generated images to bypass cycle consistency objectives, a process known as steganography. In response to the challenge of non-bijective image translation, we introduce StegoGAN, a novel model that leverages steganography to prevent spurious features in generated images. Our approach enhances the semantic consistency of the translated images without requiring additional postprocessing or supervision. Our experimental evaluations demonstrate that StegoGAN outperforms existing GAN-based models across various non-bijective image-to-image translation tasks, both qualitatively and quantitatively. Our code and pretrained models are accessible at https://github.com/sian-wusidi/StegoGAN.

StegoGAN: Leveraging Steganography for Non-Bijective Image-to-Image Translation

TL;DR

StegoGAN tackles non-bijective image-to-image translation by introducing steganography in feature space to prevent hallucination of unmatchable classes. The method explicitly disentangles matchable and unmatchable information through a backward encoder–decoder and a learnable unmatchability mask, with forward translations augmented by unmatchable content only as needed. It adds targeted losses and a mask regularization to ensure sparse, interpretable masks and to enforce matchable-consistency across translations, all without paired supervision. Empirical results on PlanIGN, GoogleMaps, and Brats MRI show improved semantic fidelity and reduced false positives of unmatchable content compared with CycleGAN and other baselines, with ablations highlighting the pivotal role of L_reg and the mask mechanism. This approach advances reliable domain translation in real-world, asymmetric settings and opens avenues for applying steganography-inspired safeguards to other generative translation tasks.

Abstract

Most image-to-image translation models postulate that a unique correspondence exists between the semantic classes of the source and target domains. However, this assumption does not always hold in real-world scenarios due to divergent distributions, different class sets, and asymmetrical information representation. As conventional GANs attempt to generate images that match the distribution of the target domain, they may hallucinate spurious instances of classes absent from the source domain, thereby diminishing the usefulness and reliability of translated images. CycleGAN-based methods are also known to hide the mismatched information in the generated images to bypass cycle consistency objectives, a process known as steganography. In response to the challenge of non-bijective image translation, we introduce StegoGAN, a novel model that leverages steganography to prevent spurious features in generated images. Our approach enhances the semantic consistency of the translated images without requiring additional postprocessing or supervision. Our experimental evaluations demonstrate that StegoGAN outperforms existing GAN-based models across various non-bijective image-to-image translation tasks, both qualitatively and quantitatively. Our code and pretrained models are accessible at https://github.com/sian-wusidi/StegoGAN.
Paper Structure (21 sections, 12 equations, 12 figures, 6 tables)

This paper contains 21 sections, 12 equations, 12 figures, 6 tables.

Figures (12)

  • Figure 1: Non-Bijective Translation. When image domains present classes without equivalence \ref{['fig:teaser:a']}, GAN models tend to hallucinate spurious features when translating images \ref{['fig:teaser:b']}. A related phenomenon is steganography, where CycleGAN-based models covertly encode features in low-amplitude patterns to bypass cycle consistency \ref{['fig:teaser:c']}. Instead of disabling this phenomenon, we harness steganography to prevent the hallucination of spurious features.
  • Figure 2: Architecture. To avoid spurious generation of unmatchable classes in non-bijective image translation, we propose to make the steganographic process explicit and in feature-space. Our model runs the backward cycle first \ref{['fig:main:a']}, then the forward translation cycle \ref{['fig:main:b']}. Thanks to our matchability disentanglement module \ref{['fig:main:c']}, we can separate the matchable and unmatchable information while translating images from domain $\mathcal{Y}$ to $\mathcal{X}$. We can then produce generated and reconstructed images with and without unmatchable features. At inference time \ref{['fig:main:d']}, our model operates like a normal image translation model.
  • Figure 3: Qualitative Comparison. We report reconstructions from the test sets of PlanIGN (top two rows), GoogleMap (row 3 and 4), and MRI (last two rows). Contrary to the other models, StegoGAN does not hallucinate spurious toponyms, highways (orange roads), or tumors (white areas) and shows better semantic correspondences during translation.
  • Figure 4: Results on GoogleMaps. We report the performance of several top-performing image translation models for different ratios of unmatchable features in the target domain of the training set. StegoGAN handles higher ratios better than competing methods.
  • Figure 5: Unmatchability Masks. The unmatchability masks predicted in the backward cycle follow the instances of unmatchable features in the target domain: toponyms, highways, and tumors.
  • ...and 7 more figures