Exploring Semantic Consistency in Unpaired Image Translation to Generate Data for Surgical Applications
Danush Kumar Venkatesh, Dominik Rivoir, Micha Pfeiffer, Fiona Kolbinger, Marius Distler, Jürgen Weitz, Stefanie Speidel
TL;DR
This work tackles semantic distortion in unpaired image translation for surgical data augmentation by introducing ConStructS, a framework that jointly employs PatchNCE-based contrastive learning and a multi-scale MS-SSIM semantic loss to preserve structural content during translation from synthetic to real surgical domains. Through extensive experiments on cholecystectomy and gastrectomy datasets, ConStructS demonstrates improved semantic consistency and increased utility of translated images for downstream segmentation, compared to multiple baselines. The findings show that a simple yet effective loss combination can outperform more complex architectures in preserving anatomy while maintaining realism, enabling more reliable synthetic data generation for data-scarce medical applications. This approach has practical impact for training surgical perception models where labeled data are scarce or privacy-constrained, and it highlights a promising direction toward semantically robust data synthesis in medical imaging.
Abstract
In surgical computer vision applications, obtaining labeled training data is challenging due to data-privacy concerns and the need for expert annotation. Unpaired image-to-image translation techniques have been explored to automatically generate large annotated datasets by translating synthetic images to the realistic domain. However, preserving the structure and semantic consistency between the input and translated images presents significant challenges, mainly when there is a distributional mismatch in the semantic characteristics of the domains. This study empirically investigates unpaired image translation methods for generating suitable data in surgical applications, explicitly focusing on semantic consistency. We extensively evaluate various state-of-the-art image translation models on two challenging surgical datasets and downstream semantic segmentation tasks. We find that a simple combination of structural-similarity loss and contrastive learning yields the most promising results. Quantitatively, we show that the data generated with this approach yields higher semantic consistency and can be used more effectively as training data.The code is available at https://gitlab.com/nct_tso_public/constructs.
